Open Bomme opened 2 years ago
Hi @Bomme
Sorry for the late response. This might be an issue with PyTorch core, but I will take a look into it. I have limited bandwidth right now, so feel free to ping/budge me for updates.
@mthrok Any news on this issue? In my opinion, it's a critical one.
I second this, it has caused many a headache.
Hi
Unfortunately, I left the project, and no one is actively looking into this issue. As this is expected to happen in some low-level code (convolution in PyTorch or MKL), please file the issue at PyTorch core or MKL.
🐛 Describe the bug
I discovered a memory leak in one of my datasets that I tracked down to resampling audio of varying length. The following code is a minimal example to reproduce the issue and it will show a steadily increasing memory footprint.
The problem does not appear if all audios are the same length (see comment).
With the help of @trundle, we found that it might be related to MKLDNN since adding the following code will resolve the issue (but slow down the execution):
Versions
Collecting environment information... PyTorch version: 1.11.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31
Python version: 3.9.6 (default, Jul 28 2021, 15:45:29) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 10.1.243 GPU models and configuration:
Nvidia driver version: 470.103.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] torch==1.11.0 [pip3] torchaudio==0.11.0 [conda] Could not collect