Closed f90 closed 6 years ago
Dear Daniel,
You are right, we should document this more, let us leave this issue open until this is done in the description of the dataset.
Meanwhile, let me answer the question: yes, you are right, this is the result of compressing the signals. Since the compression noise of the mixture is different from the sum of the compression noises of the sources, you hear the sum of those 5 noises, since you are adding: the compression for the 4 sources, minus the compression of the mixture (the true sources being cancelled by the subtraction of the true mix underlying those signals). As you mention, we did not have this problem with the DSD100, that was encoded in the WAV format.
However, as mentioned by @faroit in the Lobby https://gitter.im/sigsep-mus-2018/Lobby we checked whether these added noises do yield an important impact in the computations of the metrics for evaluation. This is not the case: we did not notice a significant difference between performance of separation on the compressed mixture and on the true mixture, as compared to the true sources. This has been checked using 3 oracle methods (IBM, ratio masks, multichannel Wiener). This investigation justifies the use of compressed signals, and the corresponding gain in storage.
In short: for all practical purposes, using compressed signals looks OK.
Is this satisfying ?
HTH
Antoine
Hello Antoine,
thanks for the extensive and quick response. Yes I would agree will all of what you are saying. A note in the data description should do it then. Oh and good job on changing the SDR computation to the song-level so we are not throwing away parts of songs with silent sources during evaluation, by the way.
So yes, can be closed as soon as a note with short explanation of the reason is added.
Best Daniel
we've launched the new sigsep website and included a note on the musdb info site. Thanks again... see you at LVA
I've been using the is_wav option to decode the database into wave files myself for faster training later. I noticed that when adding the instrument tracks for a song, they do not exactly add up to the corresponding mixture signal - a small, noisy, constant residual sound remains as the difference signal. I remember that in DSD100 the mixture was exactly the sum of its sources in the given wave files.
Is this intended or an actual bug? Is this maybe the result of compressing and decompressing the tracks individually in a lossy fashion? If it is intended, it would be good to make that clear on the websites describing the dataset, so researchers are not confused when they deploy an algorithm that uses this assumption of additivity but then does not work as well as one hoped.