juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
1.02k stars 87 forks source link

Slowdown Process when Concurrent Process #159

Closed zanjabil2502 closed 1 year ago

zanjabil2502 commented 1 year ago

I have 300s duration audio, when i use single process, the result like that: Took 0.183 (+/-0.023) seconds/chunk -- ran 59 times with Step is 5 seconc

but when two concurrent process, the result like that: Took 3.367 (+/-2.031) seconds/chunk -- ran 59 times Took 3.627 (+/-1.946) seconds/chunk -- ran 59 times with the same configuration

the program running on GPU. why the process will slowdown?

juanmc2005 commented 1 year ago

Hi @zanjabil2502, could you provide more information about your setup? How are you running your single-process and multi-process experiments? Is it diart.benchmark or have you implemented your own script? The slowdown could be coming from sharing the GPU between the two processes.

zanjabil2502 commented 1 year ago

I found the solution, before i using OnlineSpeakerDiarization as model_pipeline for two process or more process. Now, i must load OnlineSpeakerDiarization in every thread process. Process not slowdown if i use this schema.

zanjabil2502 commented 1 year ago
def diarization(self,pathaudio):
        model_pipeline = OnlineSpeakerDiarization()
        audio = FileAudioSource(pathaudio,model_pipeline.config.sample_rate)
        inference = RealTimeInference(model_pipeline, audio, show_progress=False, do_plot=False)
        prediction = inference()

for c in client:
      process = Tread(target=diarization,args=(pathaudio,))

This is new schema from me

zanjabil2502 commented 1 year ago

Sorry, i have problem again, when i use the new schema like above, in 5 thread, it is safe, process still fast, but when i use 10 thread why the process is so long. The result like this:

When 5 Threads:
Took 0.667 (+/-0.186) seconds/chunk -- ran 60 times
Took 0.671 (+/-0.181) seconds/chunk -- ran 60 times
Took 0.653 (+/-0.179) seconds/chunk -- ran 60 times
Took 0.623 (+/-0.221) seconds/chunk -- ran 60 times
Took 0.646 (+/-0.264) seconds/chunk -- ran 60 times
When 10 Threads:
Took 1.174 (+/-0.439) seconds/chunk -- ran 60 times
Took 1.173 (+/-0.380) seconds/chunk -- ran 60 times
Took 1.216 (+/-0.397) seconds/chunk -- ran 60 times
Took 1.197 (+/-0.322) seconds/chunk -- ran 60 times
Took 1.245 (+/-0.306) seconds/chunk -- ran 60 times
Took 1.217 (+/-0.316) seconds/chunk -- ran 60 times
Took 1.206 (+/-0.318) seconds/chunk -- ran 60 times
Took 1.177 (+/-0.363) seconds/chunk -- ran 60 times
Took 1.257 (+/-0.399) seconds/chunk -- ran 60 times
Took 1.191 (+/-0.395) seconds/chunk -- ran 60 times

I use 5s for the step and 300s audio for testing. what is the solution if like this? do I load models with more than 2 GPUs?

juanmc2005 commented 1 year ago

@zanjabil2502 when you run more systems in multithreading, your threads may start competing for the same resources (e.g. RAM/VRAM, because you instantiate new models in each thread). There's also python's GIL that may be bothering there.

I suggest you try multiprocessing instead. This is what I found to be most effective and it's the way I implemented the parallelization of Benchmark. This may remove the issues with the GIL but not with competing resources though.