Closed fmetze closed 5 years ago
Good news, they use the same HTK features CONFIG_FILE=/vagrant/MED_2s_100ms_htk.conf
- and actually run OPENSMILE=~/openSMILE-2.1.0/bin/linux_x64_standalone_static/SMILExtract
to compute the features.
The code base is quite different, but not impossible to refactor given
Similarities: at the lowest level, code that does:
pca = lambda x: ((x[:,mask] - mu) / sigma).dot(V) * w + b
vs
pca = lambda feat: ((feat[:, mask] - mu) / sigma).dot(V) * w + b
Differences:
class_names = ['SIL', 'CHI', 'MAL', 'FEM']
vs. getting classes from a file:
classfile = open("noisemeclasses_sum.txt", 'rb') for line in classfile: classes.append(line.rstrip('\n'))
these are three different tools and should be kept separate: "yunitator"= classify the output of a VAD into children or adults "noiseme_sad"= use the noisemes classifier (on raw input) as a SAD/VAD, so returning just "speech" versus "all others" "noiseme_full"= use the noisemes classifier (on raw input) and return the full noiseme matrix
From the user perspective, they are different, and should (continue to) be accessible using three different shell scripts. From the software engineering aspect, they are almost the same, and should use the same code base (python script). They all run out of memory for the same reason, so once we know how to fix it, we should create a single "fixed" version, which can then get called with different parameters (using three different driver scripts, if we want). Otherwise we'll have to always check three different repositories.
Wouldn't the solution be to separate the call from the tool? So there are 3 calls, 1 tool.
the 3 different tools are probably not yunitator vs. noisemes_sad vs noisemes_full (the latter 2 differ trivially and should count as one) but rather the third being the 537 class classifier TALNet
I think we should do some cleaning. Could we please:
Regarding the others, I have made changes to the instructions, trying to clarify:
There is one role assignment tool, which classifies spoken turns into three roles: children, female adults, male adults. It exists in two versions.
The version we call "yunitator" takes the raw recording as input. To call this one, do
$ vagrant ssh -c "launchers/yunitator.sh data/"
It returns one rttm per sound file, with an estimation of where there are vocalizations by children, female adults, and male adults.
For more information on the model underlying them, see the Yunitator section in the Specific instructions section.
What still remains to be done is REFACTORING the code: Keep all the bash files for the calls, but don't have a bunch of python scripts, each corresponding to each bash file. Instead, the bash script would call on a single python that takes the model name as parameter.
@jaden-w I'm making some changes and had questions regarding how you were organizing this -- could you message me on slack, so I can make sure not to impede your work?
I apologize, I didn't see your comment here. I'll send a message on slack, and we can talk about it during the call today if you would like.
completed by @jaden-w
Investigate how the "yunitator" and "noiseme_sad" and "noiseme_full" scripts can be merged into one single script, which would then get the low memory use bugfix. Downstream tools call these three tools directly, so they will need to be adjusted, too, once they have been merged into one, but overall it will be cleaner to only have one script that is being provided with a file name to process, a model to load (including mean, variances, etc), maybe the way to pre-process the data (unless this is done outside of the classification tool), and how to post-process the output.