juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
903 stars 76 forks source link

Adapt "step" automagically #206

Open hbredin opened 8 months ago

hbredin commented 8 months ago

step controls the minimum algorithmic latency of the speaker diarization pipeline.

Targetting real-time processing, one needs to make sure that the processing latency (i.e. the time it takes to process one step) is smaller than this algorithmic latency.

Said differently: the lower bound on the algorithmic latency is the processing latency, which in turns, depends on the computing power of the machine the pipeline runs on (e.g. GPU is usually faster than CPU).

Would be nice to provide an API to automatically estimate this lower bound by running a few steps when pipeline is instantiated and measuring the time it takes so that step can be set automatically to processing latency + a little safety net.

juanmc2005 commented 8 months ago

I like it, that way we can automatically set the lowest possible latency. This could be implemented as --step auto, but also somewhere in the python API

juanmc2005 commented 8 months ago

Another idea: Implement this as a diart.profile recording.wav that also runs a quick grid search on that file to suggest hyper-parameter values without running a costly tuning.

This would be useful for people that don't have much data but have a "typical" conversation that the system will encounter. Then diart would quickly suggest a config to get started.