srvk / DiViMe

ACLEW Diarization Virtual Machine
Apache License 2.0
32 stars 9 forks source link

high_volubility.py: chunk sizes #125

Open GladB opened 5 years ago

GladB commented 5 years ago

It is possible to modify the size of the chunks for the first, second and third extraction steps with the --chunk_sizes argument, however the new_onset_*_minutes function used to compute the onsets to extract those chunks only exists for 2 (second extraction step) and 5 (third extraction step) minutes:

I don't know which behavior was expected, but the second one at least can be conflicting with what the script is supposed to output.

fmetze commented 5 years ago

@alecristia - you will know best how this is supposed to work?

alecristia commented 5 years ago

No, sorry - and the second one looks like a bug, so I'm tagging Marvin

MarvinLvn commented 5 years ago

It is possible to modify the size of the chunks for the first, second and third extraction steps with the --chunk_sizes argument, however the newonset*_minutes function used to compute the onsets to extract those chunks only exists for 2 (second extraction step) and 5 (third extraction step) minutes.

Yep this functions needs the previous list of onsets. Therefore, the first one must be computed differently, with the select_onsets function

in new_onsets_two_minutes the new onset seems to be later than the given onset, meaning the extracted chunk which was supposed to be centered on the smaller chunks starts in the middle of the smaller chunk, is that on purpose?

With the following parameters :

a) a wav file of 3000 seconds b) --chunk_sizes 10.0 120.0 300.0 c) --nb_chunks 2 d) --step 600

1) We compute the onsets of the 10 seconds chunks (each of them being separated by 600 sec), these onsets are : [145.0, 745.0, 1345.0, 1945.0, 2545.0]

2) We keep the *nb_chunks 2** of them that contain the most speech, and we compute the onsets of the new chunks (sorted by amount of speech) : [690.0, 90.0, 1290.0, 2490.0]

745 became 690, 145 became 90, etc ... :

3) We keep the nb_chunks of them that contain the most speech, and we compute the onsets of the new chunks (the ones that will be returned by the script) :

[600.0, 0.0]

690 became 600, 90 became 0, etc ...

Going back to first list of onsets ([145.0, 745.0, 1345.0, 1945.0, 2545.0]), we see that we chose : 1) The second one, whose first chunk was starting at 745 (centered at 750). 2) The first one, whose first chunk was starting at 145.0 (centered at 150).

in new_onsets_five_minutes the new onset is 2 minutes before the current onset, no matter the length asked for (it could be that the --chunk_sizes argument was [10.0, 120.0, 120.0] and then none of the chunks would contain the data based on which they were ranked)

The bugs you are describing might have been fixed by this commit