pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.53k stars 652 forks source link

Torchaudio-Squim processing time increases exponentially with long file #3528

Closed Luis-117 closed 1 year ago

Luis-117 commented 1 year ago

🐛 Describe the bug

This is regarding the Torchaudio-Squim (https://pytorch.org/audio/main/tutorials/squim_tutorial.html#sphx-glr-tutorials-squim-tutorial-py).

I noticed the processing time increases exponentially as the duration of the input file rises too. For example, the tutorial input file takes around 15 seconds to evaluate an input file of 3.4 seconds. If I use an input file of 60 seconds, it takes 55 seconds to evaluate it. An input of 66 seconds takes up to 335 seconds, and an input file of 86 seconds takes 2,500 seconds.

Would this kind of behavior be expected? Also, the RAM consumption gets relatively high too (it asks for 48GB for the input file of 86 seconds). Would this kind of behavior be expected? Also, the RAM consumption gets relatively high too (it asks for 48GB for the input file of 86 seconds).

Versions

Collecting environment information... PyTorch version: 2.1.0.dev20230526+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=1198 DeviceID=CPU0 Family=198 L2CacheSize=5120 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2995 Name=11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.24.3 [pip3] torch==2.1.0.dev20230526+cpu [pip3] torchaudio==2.1.0.dev20230526+cpu [pip3] torchvision==0.16.0.dev20230526+cpu [conda] Could not collect Collecting environment information... PyTorch version: 2.1.0.dev20230526+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=1198 DeviceID=CPU0 Family=198 L2CacheSize=5120 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2995 Name=11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.24.3 [pip3] torch==2.1.0.dev20230526+cpu [pip3] torchaudio==2.1.0.dev20230526+cpu [pip3] torchvision==0.16.0.dev20230526+cpu [conda] Could not collect

mthrok commented 1 year ago

@nateanl

nateanl commented 1 year ago

Hi @Luis-117, the behavior is expected. Inside the model there is transformer layer that generates intermediate tensors with shape (batch, embed, length, length). The T^2 space complexity will require huge portion of memory also a lot of computation resource.