sigsep / open-unmix-pytorch

Open-Unmix - Music Source Separation for PyTorch
https://sigsep.github.io/open-unmix/
MIT License
1.23k stars 180 forks source link

Running umxhq on a large test track (Georgia Wonder - Siren) blows up memory >64GB #113

Open sevagh opened 2 years ago

sevagh commented 2 years ago

Running the umxhq separator with the default wiener separation (niter=1) really blows up my memory usage when I run umx with the CPU. Is it really supposed to do that?

I could swear this used to run fine before, and I never had more than 64GB of RAM. It sounds like a conspiracy but I wonder if some possible ffmpeg version upgrade could be silently causing more memory to be used?

sevagh commented 2 years ago

I just saw the other suggestion to do inference in 30s-sized chunks so I'll do it that way.

aliutkus commented 2 years ago

hmmm, could you check whether the batch_size parameter inside the expectation_maximization method of filtering.py is being used ?

If not, it means that the system is trying to process the whole track, which may be the source of the problem

sevagh commented 2 years ago

Yes, it is being used (the default of 200).

aliutkus commented 2 years ago

ok, and when you have 0 iterations, it works fine ?

sevagh commented 2 years ago

Yes, umxhq(device="cpu",niter=0) works well. The total memory usage is at 29GB, while with niter=1, it grows to >64 and gets killed. I guess this is a duplicate of https://github.com/sigsep/open-unmix-pytorch/issues/7 which is my bad

I'm just surprised because it's the first time I had an issue running a full evaluation.

aliutkus commented 2 years ago

ok. Oh I guess I should fix the memory usage

aliutkus commented 2 years ago

what's the length of the track ?

sevagh commented 2 years ago

If you would like, I can take a look with memory_profiler and see if I can create any savings to contribute to this project?

Song looks like it's 7:10:

(nsgt-torch) sevagh:nsgt $ mpv /run/media/sevagh/windows-games/MDX-datasets/MUSDB18-HQ/test/Georgia\ Wonder\ -\ Siren/mixture.wav
 (+) Audio --aid=1 (pcm_s16le 2ch 44100Hz)
AO: [pulse] 44100Hz stereo 2ch s16
A: 00:00:00 / 00:07:10 (0%) Cache: 429s/81MB
aliutkus commented 2 years ago

well ok we could do that together ! Thanks

(I'm not super available these times, but I'm curious about it). Normally this batchsize parameter should be saving quite a lot of ram, so if you may profile as a start to see what are the tensors that are exploding ?

Are you tracking gradient btw ?

sevagh commented 2 years ago

I just tried disabling grad on my audio tensor, didn't save much.

Some heavy lines from my profiling:

278 21639.691 MiB 1933.609 MiB          30           v = torch.mean(torch.abs(y[..., 0, :]) ** 2 + torch.abs(y[..., 1, :]) ** 2, dim=-2)

307 21639.691 MiB    0.000 MiB          54               Cxx = regularization
308 21639.691 MiB    0.000 MiB         270               for j in range(nb_sources):
309 21639.691 MiB 3472.941 MiB         216                   Cxx = Cxx + (v[t, ..., j, None, None, None] * R[j][None, ...].clone())

332 48965.359 MiB 3347.324 MiB         516                   gain = gain * v[t, ..., None, None, None, j]
333
334                                                         # apply it to the mixture
335 48965.359 MiB -2756.098 MiB        1548                   for i in range(nb_channels):
336 48965.359 MiB 8034.758 MiB        1032                       y[t, ..., j] = _mul_add(gain[..., i, :], x[t, ..., i, None, :], y[t, ..., j])
sevagh commented 2 years ago

I thought I could be smart and only apply Wiener on max_bin = bandwidth_to_bin(16000). It saves ~5-10 GB of memory but loses a bit of SDR.