Open sevagh opened 3 years ago
Also there could be a "super-performant" config with cupy, stacking multiple 1D FFTs (respecting GPU memory allocation limits), and using pinned host/gpu memory and FFT plans - I'll continue working in that direction.
Optimized every slow line (discovered through kernprof + line_profiler): https://github.com/sigsep/sigsep-mus-eval/compare/master...sevagh:feat/cupy-accel
This leads to just about 1 minute to compute the IRM mask and perform a BSS evaluation on 1 full-length MUSDB18 track:
real 1m1.762s
user 0m50.948s
sys 0m13.620s
This is down from the 3+ minutes originally:
real 3m22.702s
user 3m21.577s
sys 0m39.376s
@sevagh i think this would be great. Do the regression tests pass using this?
How can I run the tests? python setup.py test
?
install the test evironment pip install .[tests]
and then run
py.test tests/test_regression.py -vs
OK. My most recent commits get the regression tests passing. Casting explicitly to float32 was creating huge errors in SAR/SIR/ISR, so I just removed them.
I made the cupy install optional (although fixed to CUDA 11.4, which is rather recent).
Other notes/idiosyncrasies is that it's best to clear the cupy FFT cache between BSS evaluations of large songs. That's why I added this helper function: https://github.com/sigsep/sigsep-mus-eval/compare/master...sevagh:feat/cupy-accel#diff-cc17d32a9d811e616624c2f2699f853dd06b143931ea9e37a6cc0dab6a4b8ab9R75-R88
In real code you would do:
for track in mus.tracks:
...
scores = museval.eval_mus_track(...) # cupy under the hood
museval.clear_cupy_cache()
Passing regression test:
(museval-cupy) sevagh:sigsep-mus-eval $ py.test tests/test_regression.py -vs
===================================================== test session starts =====================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /home/sevagh/venvs/museval-cupy/bin/python
cachedir: .pytest_cache
rootdir: /home/sevagh/repos/sigsep-mus-eval, configfile: setup.cfg
collected 4 items
tests/test_regression.py::test_aggregate[Music Delta - 80s Rock] time target metric score track
[...]
Aggrated Scores (median over frames, median over tracks)
vocals ==> SDR: -15.622 SIR: 9.165 ISR: -8.476 SAR: -7.327
accompaniment ==> SDR: -13.290 SIR: -18.765 ISR: -0.322 SAR: -7.427
PASSED
tests/test_regression.py::test_track_scores[Music Delta - 80s Rock] PASSED
tests/test_regression.py::test_random_estimate[Music Delta - 80s Rock] PASSED
tests/test_regression.py::test_one_estimate[Music Delta - 80s Rock] PASSED
====================================================== warnings summary =======================================================
../../venvs/museval-cupy/lib/python3.9/site-packages/past/builtins/misc.py:45
/home/sevagh/venvs/museval-cupy/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
from imp import reload
tests/test_regression.py: 12 warnings
/home/sevagh/repos/sigsep-mus-eval/museval/metrics.py:601: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
eps = np.finfo(np.float).eps
-- Docs: https://docs.pytest.org/en/stable/warnings.html
=============================================== 4 passed, 13 warnings in 46.33s ===============================================
Hello, I have been working on some potential performance optimizations for the BSS evaluation (which is rather slow/compute intensive for full tracks).
Baseline measurement with original museval code (the total execution involves also computing the IRM, adapted from https://github.com/sigsep/sigsep-mus-oracle/blob/master/IRM.py):
The original code takes ~3:20 minutes.
The second optimization uses cupy and the GPU, which is in my opinion a big cost/burden for end users. Installing the CUDA toolkit etc. is no joke. Here is the code: https://github.com/sigsep/sigsep-mus-eval/compare/master...sevagh:feat/cupy-accel However, the performance is rather good at ~1:20 minutes, so maybe almost ~3x faster than the original code:
One final note is that the CUDA/cupy version has slight differences in the outputs due to numerical precision differences. It doesn't look too significant to me - here's an excerpt of a diff between the evaluated json files, showing small differences in the BSS scores:
I'm also trying to find a way to use CPU parallelism with scipy.fft and combining several of the FFTs in a single call, but this isn't really helping as much as the CUDA change. My code attempts can be seen here: https://github.com/sigsep/sigsep-mus-eval/compare/master...sevagh:multiple-1d-fft
I'm aware of the separate repo for bss at https://github.com/sigsep/bsseval/ but I wasn't sure which project to discuss it in - I'm using museval because I'm trying to recreate the SiSec 2018 testbench.