srvk / DiViMe

ACLEW Diarization Virtual Machine
Apache License 2.0
32 stars 9 forks source link

Unify various "Yunitator" and "Noiseme" scripts #48

Closed fmetze closed 5 years ago

fmetze commented 6 years ago

Investigate how the "yunitator" and "noiseme_sad" and "noiseme_full" scripts can be merged into one single script, which would then get the low memory use bugfix. Downstream tools call these three tools directly, so they will need to be adjusted, too, once they have been merged into one, but overall it will be cleaner to only have one script that is being provided with a file name to process, a model to load (including mean, variances, etc), maybe the way to pre-process the data (unless this is done outside of the classification tool), and how to post-process the output.

riebling commented 6 years ago

Good news, they use the same HTK features CONFIG_FILE=/vagrant/MED_2s_100ms_htk.conf - and actually run OPENSMILE=~/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract to compute the features.

The code base is quite different, but not impossible to refactor given

Similarities: at the lowest level, code that does:

pca = lambda x: ((x[:,mask] - mu) / sigma).dot(V) * w + b

vs

pca = lambda feat: ((feat[:, mask] - mu) / sigma).dot(V) * w + b

Differences:

alecristia commented 6 years ago

these are three different tools and should be kept separate: "yunitator"= classify the output of a VAD into children or adults "noiseme_sad"= use the noisemes classifier (on raw input) as a SAD/VAD, so returning just "speech" versus "all others" "noiseme_full"= use the noisemes classifier (on raw input) and return the full noiseme matrix

fmetze commented 6 years ago

From the user perspective, they are different, and should (continue to) be accessible using three different shell scripts. From the software engineering aspect, they are almost the same, and should use the same code base (python script). They all run out of memory for the same reason, so once we know how to fix it, we should create a single "fixed" version, which can then get called with different parameters (using three different driver scripts, if we want). Otherwise we'll have to always check three different repositories.

alecristia commented 6 years ago

Wouldn't the solution be to separate the call from the tool? So there are 3 calls, 1 tool.

riebling commented 6 years ago

the 3 different tools are probably not yunitator vs. noisemes_sad vs noisemes_full (the latter 2 differ trivially and should count as one) but rather the third being the 537 class classifier TALNet

alecristia commented 6 years ago

I think we should do some cleaning. Could we please:

Regarding the others, I have made changes to the instructions, trying to clarify:

There is one role assignment tool, which classifies spoken turns into three roles: children, female adults, male adults. It exists in two versions.

The version we call "yunitator" takes the raw recording as input. To call this one, do

$ vagrant ssh -c "launchers/yunitator.sh data/"

It returns one rttm per sound file, with an estimation of where there are vocalizations by children, female adults, and male adults.

For more information on the model underlying them, see the Yunitator section in the Specific instructions section.

What still remains to be done is REFACTORING the code: Keep all the bash files for the calls, but don't have a bunch of python scripts, each corresponding to each bash file. Instead, the bash script would call on a single python that takes the model name as parameter.

alecristia commented 6 years ago

@jaden-w I'm making some changes and had questions regarding how you were organizing this -- could you message me on slack, so I can make sure not to impede your work?

jaden-w commented 6 years ago

I apologize, I didn't see your comment here. I'll send a message on slack, and we can talk about it during the call today if you would like.

alecristia commented 5 years ago

completed by @jaden-w