Open HeChengHui opened 11 months ago
Hi, I have a 3090 and my peak VRAM usage was 17GB. for 30 seconds of audio
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.89 GiB (GPU 0; 14.75 GiB total capacity; 11.70 GiB already allocated; 1.57 GiB free; 12.10 GiB reserved in total by PyTorch)
Guys, developers, please make sure your app consumes at least no more than 10 gigabytes of GPU memory for any lenght of audio. Now audioSR cannot be run even on Google Colab T4 with 14 gigabytes of memory. Now it is not usable for users, at all.
@haoheliu is there a way to lower the memory usage?
oh, this maybe explains my problems! I got 11gb vram and been sitting waiting for any results , it is saying 10.5/11 gb usage, I have no results yet and and no clear message about the status , the cmd windows just says "DiffusionWrapper has 258.20 M params." would waiting longer give any result? would cpu take even longer?
hope to see this kind of tool but have little faith in this one particular. the way it installs randomly in some c folder also feels bad , hope it is not actually malicious. 😞
Something this thing doesn't want to work( I want to process a large file with voice recording (duration 1 hour 14 minutes), I have already divided it into 13 parts, but still no result. Each time manually divide such a large file into the recommended 5.12 seconds is not convenient. It would be great if you realized the possibility of processing large files (perhaps by adding tools to automatically divide large files into parts of suitable size, and after processing also automatically glue them together.....
[CMD log]
... Warning: audio is longer than 10.24 seconds, may degrade the model performance. It's recommand to truncate your audio to 5.12 seconds before input to AudioSR to get the best performance. Traceback (most recent call last): File "C:\Program Files\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr__main.py", line 42, in
main(args) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\ main__.py", line 17, in main waveform = super_resolution( File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\pipeline.py", line 168, in super_resolution waveform = latent_diffusion.generate_batch( File "C:\Users\КънстантiнЪ\AppData\Roaming\Python\Python310\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py", line 1493, in generate_batch z, c = self.get_input( File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py", line 837, in get_input encoder_posterior = self.encode_first_stage(x) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py", line 935, in encode_first_stage return self.first_stage_model.encode(x) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_encoder\autoencoder.py", line 106, in encode h = self.encoder(x) File "C:\Users\КънстантiнЪ\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\modules\diffusionmodules\model.py", line 526, in forward h = self.down[i_level].block[i_block](hs[-1], temb) File "C:\Users\КънстантiнЪ\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\modules\diffusionmodules\model.py", line 158, in forward h = nonlinearity(h) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\modules\diffusionmodules\model.py", line 35, in nonlinearity return x * torch.sigmoid(x) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.81 GiB (GPU 0; 6.00 GiB total capacity; 21.08 GiB already allocated; 0 bytes free; 21.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
My hardware: CPU: OctalCore Intel Core i7-11800H, 4500 MHz (45 x 100) RAM: DDR4-3200 (1600 МГц) 40GB GPU: nVIDIA GeForce RTX 3060 Laptop (Gigabyte) 6GB
Something this thing doesn't want to work( I want to process a large file with voice recording (duration 1 hour 14 minutes), I have already divided it into 13 parts, but still no result. Each time manually divide such a large file into the recommended 5.12 seconds is not convenient. It would be great if you realized the possibility of processing large files (perhaps by adding tools to automatically divide large files into parts of suitable size, and after processing also automatically glue them together.....
Maybe this will help:
ffmpeg -i input.wav -f segment -segment_time 5.12 -c copy out%03d.wav
After AudioSR processing:sox *.wav merged.wav
@maxdudik you are able to process the files in 5.12 second chunks? I can't even get a single 2 second file to process on a 3080.
I tried running the audiosr script in
Anaconda\envs\audiosr
using the cmd:python audiosr
. I am facing a CUDA OOM error as shown:Wonder what is the minimum memory requirement?