sigsep / sigsep-mus-eval

museval - source separation evaluation tools for python
https://sigsep.github.io/sigsep-mus-eval/
MIT License
199 stars 36 forks source link

using museval without musdb #80

Closed K3nn3th2 closed 1 year ago

K3nn3th2 commented 3 years ago

hey folks,

i have a question regarding the usage of museval (sorry, i am new!) it is stated in the readme that museval can be used without musdb. in my case i only want to compare my generated accompaniment.wav's with their respective reference accompaniment mixdowns. how can this be done?

ps: i am using mir_eval at the moment. will museval offer anything more/better?

faroit commented 3 years ago

@K3nn3th2 hi and welcome to the messy world of music separation evaluation ;-)

You can use museval in exactly the same way as you would use mir_eval, in fact if you rely on museval.evaluate it even has the same API.

museval has a slighly different behaviour, which is why we call it bss_eval v4, you can also switch back to v3 with museval if you want the exact same results as you'd get with mir_eval.

K3nn3th2 commented 3 years ago

thanks for the info!

I'm having a little trouble understanding the outputs of the evaluation though.

should i open a new issue for this?

up to now i know the values are the respective ratios expressed in db. but i have not yet understood how to interpret the values. how do the numbers tell me which one of several instrumentals is the "closest" to a reference instrumental?

maybe someone can help to clarify.

in order to try to get an understanding i have tried different comparisons(evaluations). I started with the comparison of an instrumental with itself. i did this to see what the optimal result would be. Then i evaluated different spleeter results (accompaniments) of tracks using the same instrumental.

comparison of reference with reference: SDR array([[inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]]) ISR array([[269.16863015, 270.55693938, 272.77735284, 270.68058439, 267.38251892, 270.23036122, 270.78947139, 254.79207897, 264.65922524, 260.84196734, 255.15494614, 269.73380464, 270.90594928, 268.55593003, 271.32344313, 272.66700425, 263.76809101, 270.51465723, 270.79659201, 271.75641287, 267.36341119, 267.7437203 , 266.99546987, 268.54107743, 271.587673 , 270.33929179, 267.76233616, 273.04912486, 273.78093555, 274.66218178, 270.94394846, 270.87504836, 274.9165066 , 267.04567733, 259.57618796, 270.02523115, 265.63287773, 263.16755246, 274.4346741 , 277.37522406, 271.68860115, 262.20031424, 270.71165428, 276.41508388, 274.22968298, 271.63739645, 270.86277649, 268.91011388, 260.95747268, 263.38307654, 274.93110316, 279.20718884, 274.2246709 , 261.55032859, 270.37317171, 264.79554293, 269.16341141, 263.67055438, 267.54385174, 270.45156505, 264.42113915, 266.61621089, 269.43344628, 271.31448741, 268.07411493, 270.96324465, 272.043789 , 267.93570238, 267.01179458, 261.77001684, 259.31776734, 259.09156706, 269.36979414, 275.41398164, 265.08846161, 258.64402396, 269.82119605, 267.29161205, 269.69369252, 269.97083147, 259.4764192 , 270.10291828, 264.91958958, 270.1564101 , 267.48691385, 261.60140619, 258.31305115, 269.52402856, 268.6474655 , 266.41503636, 271.54318521, 269.57689288, 269.48380306, 275.29504719, 265.47243805, 268.89904423, 267.62546329, 259.69996439, 267.1597966 ]]) SIR array([[inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]]) SAR array([[269.16863015, 270.55693938, 272.77735284, 270.68058439, 267.38251892, 270.23036122, 270.78947139, 254.79207897, 264.65922524, 260.84196734, 255.15494614, 269.73380464, 270.90594928, 268.55593003, 271.32344313, 272.66700425, 263.76809101, 270.51465723, 270.79659201, 271.75641287, 267.36341119, 267.7437203 , 266.99546987, 268.54107743, 271.587673 , 270.33929179, 267.76233616, 273.04912486, 273.78093555, 274.66218178, 270.94394846, 270.87504836, 274.9165066 , 267.04567733, 259.57618796, 270.02523115, 265.63287773, 263.16755246, 274.4346741 , 277.37522406, 271.68860115, 262.20031424, 270.71165428, 276.41508388, 274.22968298, 271.63739645, 270.86277649, 268.91011388, 260.95747268, 263.38307654, 274.93110316, 279.20718884, 274.2246709 , 261.55032859, 270.37317171, 264.79554293, 269.16341141, 263.67055438, 267.54385174, 270.45156505, 264.42113915, 266.61621089, 269.43344628, 271.31448741, 268.07411493, 270.96324465, 272.043789 , 267.93570238, 267.01179458, 261.77001684, 259.31776734, 259.09156706, 269.36979414, 275.41398164, 265.08846161, 258.64402396, 269.82119605, 267.29161205, 269.69369252, 269.97083147, 259.4764192 , 270.10291828, 264.91958958, 270.1564101 , 267.48691385, 261.60140619, 258.31305115, 269.52402856, 268.6474655 , 266.41503636, 271.54318521, 269.57689288, 269.48380306, 275.29504719, 265.47243805, 268.89904423, 267.62546329, 259.69996439, 267.1597966 ]])

comparison of accompaniment A with reference:

SDR array([[-2.20299431, -3.44665773, -4.32362323, -2.59291869, -3.20582082, -2.42250674, -3.5256302 , -6.45105581, -3.48348382, -6.23755438, -8.35427858, -4.31546932, -4.05821535, -3.55504728, -2.8900589 , -3.77453725, -3.39629192, -3.26165669, -3.8475892 , -3.10886967, -3.67224018, -3.65790734, -2.63164077, -3.78962246, -3.36762654, -2.01311095, -4.09827865, -3.47802425, -1.69270146, -4.61406728, -3.15144999, -2.79766807, -4.24581095, -2.85323119, -3.789169 , -3.60533674, -2.85090699, -4.24936314, -3.71128432, -3.00187406, -4.11590682, -3.62101179, -3.43241758, -3.42042305, -3.72578946, -3.09667129, -3.6583761 , -2.84318241, -3.51611784, -3.43527161, -1.98347537, -3.99236456, -2.9449687 , -1.71463759, -3.96775851, -3.16596612, -3.16879696, -3.79553407, -3.36663528, -3.76761639, -3.66585205, -3.10061487, -3.87142855, -3.78610074, -2.4095901 , -4.1855077 , -3.40597416, -2.35616714, -4.09671955, -2.79610527, -2.72530109, -3.66411864, -2.76619574, -2.82721708, -4.11168608, -2.76161654, -3.63634107, -3.17385456, -2.77888029, -4.13321923, -3.64311191, -3.02233819, -4.1352825 , -3.50502002, -2.96142075, -3.6718496 , -3.21587397, -3.84873756, -3.39915604, -2.95659006, -3.17954855, -2.41793938, -0.44639723, -2.00388709, -0.9002185 , -0.11048556, -0.77842919, -0.09471525, 0.04791583]]) ISR array([[ 0.51768749, -1.15181276, -2.85506582, 0.51426604, -0.35129361, 0.3023695 , -0.97875589, -1.81027661, 0.54562897, 0.2166434 , -1.17463782, -1.12188273, -1.94606139, -1.50683113, -0.06079989, -1.53735054, -0.88840788, -1.14179102, -1.18577243, -0.3251433 , -1.41175329, -1.38189811, 1.00688187, -1.3130323 , -0.79692314, 1.41859409, -2.15265477, -1.15809971, 1.89458408, -2.34987723, -0.2747686 , 0.69664798, -2.36204005, 0.52906095, -1.48394285, -1.32193656, 0.3303114 , -2.44372904, -1.5531324 , 0.05478406, -1.8045285 , -1.28284334, -1.01446125, -0.97285571, -1.16985043, -0.08903355, -1.04420536, 0.34964006, -0.7300848 , -0.76670055, 1.49876035, -2.01985743, 0.48116473, 1.6640827 , -1.75468029, -0.22523906, -0.68082411, -1.39739797, -1.11695314, -1.58465832, -1.06395612, -0.31484748, -1.68442715, -1.46764121, 1.53105358, -2.39706121, -0.90961851, 1.51032226, -1.90856839, 0.11986681, 0.55916508, -1.33192507, 0.49296439, -0.04019043, -1.63651552, 0.48465399, -0.9432238 , -0.4249699 , 0.91489262, -2.06583318, -1.50347856, -0.58512582, -1.93247135, -0.90648224, -0.51029162, -1.10547659, -0.52829314, -1.49113114, -1.24121716, 0.80311432, -1.10172646, -0.2806317 , 1.61746827, -0.97661865, -0.02612483, 0.43503279, -0.54710756, 0.03488161, 0.06633197]]) SIR array([[inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]]) SAR array([[ -8.21850188, -8.3053289 , -2.28848194, -6.08896584, -7.34563683, -7.10214968, -3.02647254, -8.27793029, -5.68993911, -7.34866256, -9.38799691, -8.58084582, -7.46245104, -8.00160106, -5.64970542, -9.05674003, -7.09995937, -5.70680893, -12.39266867, -6.99773224, -8.22807071, -7.71986435, -10.19867672, -10.41643576, -8.52386955, -4.76501939, -6.22300166, -7.77346232, -6.56157538, -6.77047973, -9.02681564, -7.93725714, -7.27092004, -9.0908664 , -9.26020928, -8.18600867, -11.70399301, -6.36233495, -6.20659242, -9.29197833, -8.78332902, -8.06316683, -5.96305629, -13.11750516, -9.03483649, -11.49859473, -10.53566359, -10.10173485, -11.44339394, -9.86300903, -6.24354266, -6.77133291, -8.63236522, -3.9430462 , -9.04937848, -9.48850857, -7.09776023, -11.87703888, -6.36723283, -6.5485743 , -11.38401807, -8.37582122, -8.7050479 , -8.19016991, -9.56300123, -6.16589169, -8.11999396, -10.00798158, -8.23954007, -8.43553413, -8.3093088 , -9.26793329, -8.05730581, -10.94831788, -8.51978206, -9.66765705, -9.51285184, -12.95984802, -7.2263578 , -8.59380297, -6.62860325, -5.14870757, -9.19675679, -8.50539113, -6.91370835, -11.7789907 , -8.64454607, -9.15130806, -7.97071432, -9.62054632, -9.84871055, -10.69324651, -5.21851184, -8.934417 , -9.84827298, -7.26356587, -6.29341686, -9.51229705, -9.54289506]])

comparison of accompaniment B with reference:

SDR array([[-3.05939103, -3.45749409, -3.70736113, -3.32756927, -3.20900809, -2.76921754, -3.17991233, -4.2934799 , -4.46997647, -7.31421624, -7.57908748, -4.34895634, -2.25392327, -2.17995178, -2.31157519, -2.36525229, -2.06297414, -2.52419617, -2.98778896, -1.81450123, -2.42034299, -3.09786124, -2.71156293, -3.15852663, -3.52826348, -2.28999877, -3.52360925, -4.30751411, -2.3213351 , -3.69769751, -3.52867953, -3.04524926, -3.61038796, -3.19213299, -3.06711466, -2.87405499, -2.26344561, -2.25321298, -2.61473594, -2.42614648, -2.71568546, -2.38585268, -2.45395659, -2.85273949, -2.30769289, -3.09016754, -3.64269976, -3.15332122, -3.22276881, -3.95990889, -2.84111219, -3.47371756, -3.08843634, -2.24237938, -3.23884874, -3.30297912, -2.51875872, -3.43239093, -1.5941294 , -2.03284112, -2.51721092, -2.8080972 , -2.41172067, -2.55887085, -2.75532022, -2.2865198 , -2.793919 , -3.19561341, -3.22088791, -3.28522175, -2.82765752, -3.01725612, -3.29011409, -2.97442996, -4.80908658, -7.30764871]]) ISR array([[-0.13550868, -0.84449633, -1.62471046, -0.49586508, 0.05495359, 0.26830292, -1.06264616, -0.92815729, 0.08089074, -2.68447688, -2.52990088, 0.02994124, 1.85258021, 1.44096987, 1.69349858, 1.26420397, 2.1152671 , 1.19816428, 0.73814619, 2.61874003, 1.99799928, 0.32934125, 0.99597739, 0.4220487 , -0.3094926 , 1.9128618 , -0.96748795, -1.85025525, 1.35905762, -0.6291096 , -0.33855397, 0.42062771, -0.96165544, 0.22247117, 0.68521482, 0.81311494, 1.47941253, 1.3163305 , 1.16599659, 1.34716009, 0.8545359 , 1.74690233, 1.91118167, 0.36988876, 1.52128863, 0.46189947, -0.15779429, 0.14742942, 0.39421841, -0.36800869, 0.29032826, -0.79552256, 0.61683525, 1.37997835, -0.41370892, -0.09810682, 1.21008399, 0.6109578 , 2.55448088, 2.2376892 , 0.61606895, 1.41047924, 0.98419281, 0.84707608, 1.45530551, 2.29693325, 0.9712383 , -0.1262129 , -0.4891609 , -0.10786353, 0.58053594, -0.02070426, 0.037549 , -0.21879603, -1.46198064, -0.23327509]]) SIR array([[inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]]) SAR array([[-16.97699799, -8.88209519, -5.71813591, -12.08173326, -11.22691162, -7.88144132, -8.82945072, -12.99445515, -11.4756314 , -12.07413895, -13.01948114, -15.11936417, -10.02453048, -10.27429394, -8.36520372, -11.59312094, -9.0716541 , -9.18134132, -14.63106849, -7.50400452, -8.72973305, -10.62899553, -10.39181318, -14.38797548, -14.50690262, -7.86371768, -7.96007963, -9.02967791, -8.38344521, -11.22589522, -11.89440445, -9.8469442 , -11.45471827, -11.5204166 , -14.14982652, -10.69279177, -10.84190522, -10.34147658, -9.42603747, -10.76001564, -13.77624667, -10.41868541, -8.44286739, -17.30204547, -10.55728351, -14.75661733, -11.0554085 , -11.53247204, -12.78663447, -15.14174149, -6.94829697, -8.66168659, -13.1596099 , -6.83453029, -8.51904983, -12.86484161, -11.04151343, -12.51875307, -7.78811471, -9.25329516, -11.47841542, -10.25011711, -12.55895765, -13.6484766 , -9.74098012, -8.37969875, -11.18835239, -12.63434969, -10.24559656, -16.29123673, -11.85236253, -9.22190321, -13.16514091, -11.13506959, -11.56282737, -9.5660347 ]])

i see that the values are lower than in the self comparison case. but if i compare several different instrumentals to the original instrumental, how can i determine the most similar one by looking at the numbers?

thanks in advance!

faroit commented 3 years ago

@K3nn3th2 can you send me a code snippet how you run the evaluation? (what are the tensor shapes)?

K3nn3th2 commented 3 years ago

i am using librosa for loading.

import mir_eval
import librosa
import numpy as np
import pprint

path_estim = '/home/user/estim.wav'
path_ref = '/home/user/ref.wav'

estim, sr = librosa.load(path_estim)
ref, sr = librosa.load(path_ref)

# assume they are aligned already.
# in future needs alignment for proper evaluation of tracks with the instrumental starting a little late
if len(estim) > len(ref):
    estim = estim[:len(ref)]
else:
    ref = ref[:len(estim)]

estim_nps = [] 
estim_nps.append(np.asarray(estim))

ref_nps = []
ref_nps.append(np.asarray(ref))

ref_nps = np.asarray(ref_nps)
estim_nps = np.asarray(estim_nps)
sdr, isr, sir, sar = museval.evaluate(ref_nps, estim_nps, mode="v3")
print('\nmus_eval results:')
#mus_dic = dict(mus_eva)
print('SDR')
pprint.pprint(sdr)
print('ISR')
pprint.pprint(isr)
print('SIR')
pprint.pprint(sir)
print('SAR')
pprint.pprint(sar)
faroit commented 3 years ago

@K3nn3th2 what are the shape of ref_nps and estim_nps? it looks like you are comparing a single reference source with a single estimate (which is not the purpose of bsseval evaluation)

K3nn3th2 commented 3 years ago

yes its true, in this example i am only using one generated instrumental and one reference. i would use several generated instrumentals and one reference in my actual use case. i was hoping to get some info about the qualities of the results by looking at the numbers.

am i missing something here?

sorry for getting more and more off-topic.. i was thinking of using PEASS but am now a little unsure if it is also not fitting for my intentions..

faroit commented 2 years ago

@K3nn3th2 can this be closed?