juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
1.1k stars 90 forks source link

save a segment of that model is predicted #246

Open m15kh opened 1 month ago

m15kh commented 1 month ago

"Hi! @juanmc2005 I want to save the segments that the model predicts as containing speech. The model detects segments in real-time where someone is talking, and I specifically want to save those audio segments where the model indicates 'yes' for a spoken label. Please save these detected sounds in WAV format."

m15kh commented 1 month ago

@juanmc2005 can you help me for this issue?

juanmc2005 commented 2 weeks ago

Hi @m15kh,

The SpeakerDiarization pipeline does provide the waveform aligned to the current diarization output (see here).

The StreamingInference class provides you a way to execute some code when a new pair of "output-audio" is available. You can achieve this with the attach_hooks() method (see here) by passing a function to execute whenever a new tuple[Annotation, SlidingWindowFeature] is available. Then it would be a matter of cropping the audio according to the speech in the annotation.