Open m15kh opened 1 month ago
@juanmc2005 can you help me for this issue?
Hi @m15kh,
The SpeakerDiarization
pipeline does provide the waveform aligned to the current diarization output (see here).
The StreamingInference
class provides you a way to execute some code when a new pair of "output-audio" is available.
You can achieve this with the attach_hooks()
method (see here) by passing a function to execute whenever a new tuple[Annotation, SlidingWindowFeature]
is available. Then it would be a matter of cropping the audio according to the speech in the annotation.
"Hi! @juanmc2005 I want to save the segments that the model predicts as containing speech. The model detects segments in real-time where someone is talking, and I specifically want to save those audio segments where the model indicates 'yes' for a spoken label. Please save these detected sounds in WAV format."