Can the segmeter run in parallel/Async?

elieobeid7 commented 1 week ago

I'm using inaSpeechSegmenter version 0.7.8, I use this to detect speakers

from inaSpeechSegmenter import Segmenter
seg = Segmenter()
data = seg(file)

Currently, I process many files in a queue sequentially. I've been asked if I could make the processing run in parallel, and I'm not sure about that, mainly because I don't what would happen if I put the code above in a async function or thread pool or whatever.

When I tried doing that for Openai Whisperer, the quality of the results was so bad, so I assume the same thing would happen with inaSpeechSegmenter, what do you think?

DavidDoukhan commented 1 week ago

I'm not sure to fully understand your question, but here are few undocumented tricks that may help to go faster. If using gpu, you may use the batch_size argument of the constructor which default value is 32. ie : g = Segmenter(batch_size=1024) then for processing lists of files, I would suggest using the batch_process method instead of the default __call__: ret = g.batch_process(lsrc, ldst, skipifexist=True, nbtry=3) with :

lsrc the list of file/url to process
ldst the list of corresponding output csv
skipiftexist : if True, processing won't be done if output file already exists
nbtry : sometime remote url download may fail, and the program will try nbtry times

I Usually lauch 1 process / GPU in multi GPU machine

I willing to run the program in //, you may have a look to the programs in scripts directory: ina_speech_segmenter_pyro_server.py and ina_speech_segmenter_pyro_client.py

Kind regards,

elieobeid7 commented 4 days ago

Ok thank you so much, I'll try your suggestions.

ina-foss / inaSpeechSegmenter

Can the segmeter run in parallel/Async? #84