haoheliu / versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.
MIT License
1.07k stars 106 forks source link

CUDA memory enquiry #20

Open HeChengHui opened 11 months ago

HeChengHui commented 11 months ago

I tried running the audiosr script in Anaconda\envs\audiosr using the cmd: python audiosr. I am facing a CUDA OOM error as shown:

untyped_storage = torch.UntypedStorage( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.21 GiB is allocated by PyTorch, and 81.22 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Wonder what is the minimum memory requirement?

audiosr                   0.0.3                    pypi_0    pypi
torch                     2.2.0.dev20230922+cu121          pypi_0    pypi
torchaudio                2.2.0.dev20230922+cu121          pypi_0    pypi
torchvision               0.17.0.dev20230922+cu121          pypi_0    pypi
daswer123 commented 11 months ago

Hi, I have a 3090 and my peak VRAM usage was 17GB. for 30 seconds of audio

vrubzov1957 commented 11 months ago

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.89 GiB (GPU 0; 14.75 GiB total capacity; 11.70 GiB already allocated; 1.57 GiB free; 12.10 GiB reserved in total by PyTorch)

Guys, developers, please make sure your app consumes at least no more than 10 gigabytes of GPU memory for any lenght of audio. Now audioSR cannot be run even on Google Colab T4 with 14 gigabytes of memory. Now it is not usable for users, at all.

HeChengHui commented 11 months ago

@haoheliu is there a way to lower the memory usage?

madwurmz commented 11 months ago

oh, this maybe explains my problems! I got 11gb vram and been sitting waiting for any results , it is saying 10.5/11 gb usage, I have no results yet and and no clear message about the status , the cmd windows just says "DiffusionWrapper has 258.20 M params." would waiting longer give any result? would cpu take even longer?

hope to see this kind of tool but have little faith in this one particular. the way it installs randomly in some c folder also feels bad , hope it is not actually malicious. 😞

Constantin-eee commented 11 months ago

Something this thing doesn't want to work( I want to process a large file with voice recording (duration 1 hour 14 minutes), I have already divided it into 13 parts, but still no result. Each time manually divide such a large file into the recommended 5.12 seconds is not convenient. It would be great if you realized the possibility of processing large files (perhaps by adding tools to automatically divide large files into parts of suitable size, and after processing also automatically glue them together.....

[CMD log]

... Warning: audio is longer than 10.24 seconds, may degrade the model performance. It's recommand to truncate your audio to 5.12 seconds before input to AudioSR to get the best performance. Traceback (most recent call last): File "C:\Program Files\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr__main.py", line 42, in main(args) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\main__.py", line 17, in main waveform = super_resolution( File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\pipeline.py", line 168, in super_resolution waveform = latent_diffusion.generate_batch( File "C:\Users\КънстантiнЪ\AppData\Roaming\Python\Python310\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py", line 1493, in generate_batch z, c = self.get_input( File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py", line 837, in get_input encoder_posterior = self.encode_first_stage(x) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py", line 935, in encode_first_stage return self.first_stage_model.encode(x) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_encoder\autoencoder.py", line 106, in encode h = self.encoder(x) File "C:\Users\КънстантiнЪ\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\modules\diffusionmodules\model.py", line 526, in forward h = self.down[i_level].block[i_block](hs[-1], temb) File "C:\Users\КънстантiнЪ\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\modules\diffusionmodules\model.py", line 158, in forward h = nonlinearity(h) File "C:\Program Files\Python\Python310\lib\site-packages\audiosr\latent_diffusion\modules\diffusionmodules\model.py", line 35, in nonlinearity return x * torch.sigmoid(x) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.81 GiB (GPU 0; 6.00 GiB total capacity; 21.08 GiB already allocated; 0 bytes free; 21.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My hardware: CPU: OctalCore Intel Core i7-11800H, 4500 MHz (45 x 100) RAM: DDR4-3200 (1600 МГц) 40GB GPU: nVIDIA GeForce RTX 3060 Laptop (Gigabyte) 6GB

maxdudik commented 11 months ago

Something this thing doesn't want to work( I want to process a large file with voice recording (duration 1 hour 14 minutes), I have already divided it into 13 parts, but still no result. Each time manually divide such a large file into the recommended 5.12 seconds is not convenient. It would be great if you realized the possibility of processing large files (perhaps by adding tools to automatically divide large files into parts of suitable size, and after processing also automatically glue them together.....

Maybe this will help: ffmpeg -i input.wav -f segment -segment_time 5.12 -c copy out%03d.wav After AudioSR processing: sox *.wav merged.wav

smit-io commented 1 week ago

@maxdudik you are able to process the files in 5.12 second chunks? I can't even get a single 2 second file to process on a 3080.