Closed nx5216 closed 3 years ago
there are differences between museval and mir_eval. If you want the same results in museval as in mir_eval, you have to use mode='v3'
instead of the default.
Please tell me if that helps.
I have tried museval with mode='v3',but the results still change by volume,here is the results: museval: mir_eval: if i want to use museval,is it necessary to adjust the volume to a proper value? I think The method of evaluation should be independent of volume.
@aliutkus any idea?
Isn't it because in this call bsseval_sources_version
is always set to False
even when we set mode='v3'
? This means, according to explanations in metrics.py
that bss_eval_image
will be used for the computation, which is dependent to scale but do not introduce fancy filters.
Correct me if I am wrong, but the only thing mode
is changing is that, for bss_eval_source
, the allowed distortion filter is the same over the whole track and not varying over time. For bss_eval_image
, I am not sure what the version is changing.
If one wants to have a constant SDR no matter the scaling, there are two choices:
bss_eval_source
e.g. from mir_eval
. The latent cost is that it allows some distortions that may be unwanted, e.g. for MUSDB it makes no sense to allow distortions that are supposed to reflect the difference between the sources and microphones, when you actually just sum up the individual sources to produce the mixture (as in open-unmix).mir_eval
, museval
and bss_eval
.Something I don't really understand in the end is what "SDR" means in SiSeC 2018 and in subsequent papers. From what I understand, it means that the "raw" SDR bss_eval_image
has been used. But what about the scaling? How do we know that some results weren't made artificially better just by scaling?
Hopefully from my understanding of the previous paper introducing SI-SDR, it happens that the "raw" SDR is upper-bounded for one given target and estimate. But does it mean that papers using bss_eval_image
rescaled its output to maximize SDR?
Sorry to tag, but in case you did not see it @faroit @aliutkus
@fxmarty sorry for the late reply. As I remember, we have made the decision to make bss_eval_images
the default for music tasks and we wanted to prevent people from continue using bss_eval_sources
.
Something I don't really understand in the end is what "SDR" means in SiSeC 2018 and in subsequent papers. From what I understand, it means that the "raw" SDR bss_eval_image has been used. But what about the scaling? How do we know that some results weren't made artificially better just by scaling?
i guess you don't. the normal SDR is not scale invariant, and its likely that results are worse for some methods due to some bad scaling (e.g. badly reconstructive STFT).
I guess, @aliutkus has some opinions but you better contact him via mail ;-)
when I change the volume of estimates, the results by museval will be different, but when I use mir_eval, the results will not change. Is there any difference between museval and mir_eval?