Audio data lost before VAD `isActive` state change

mochi-neko / voice-activity-detection-unity

A voice activity detection (VAD) library for Unity.

MIT License

44 stars 6 forks source link

Audio data lost before VAD `isActive` state change #3

Open WilliamCheen opened 4 months ago

WilliamCheen commented 4 months ago

Hello, thank you very much for this library, it has been very helpful. We have been using it for a while now, but some users feedback that if it is a very short conversation just the beginning of the recording may be lost. My understanding is that the AudioData before VAD detects the change of isActive = true state is discarded, is there any way to get the complete AudioData, including the short part of data before isActive = true?

mochi-neko commented 4 months ago

Thank you for your issue.

If you use QueueingVoiceActivityDetector as VAD logic, please try to replace QueueingVoiceActivityDetector with CumulativeVoiceActivityDetector at initialization.

CumulativeVoiceActivityDetector improves audio data collection before isActive = true and is more stable logic (but more memory usages).

WilliamCheen commented 4 months ago

Wow... Thank you very much for your reply, I'll try the above mentioned method later, anyway, thanks a lot!

WilliamCheen commented 3 months ago

I'm sorry I'm still here, I've looked at this CumulativeVoiceActivityDetector, and although there are comments for each parameter here, I still don't quite understand the meaning of each parameter, can you explain in detail the meaning of each parameter, like activeChargeTimeRate, maxChargeTimeSeconds and so on. Or you can explain how CumulativeVoiceActivityDetector works. Again, thank you very much for this library! If you know of any blogs or articles that describe how it works, please let me know, so I can check out those articles as well.

dnnkeeper commented 2 months ago

@mochi-neko @WilliamCheen it seems I managed to improve activationQueue voice preservation by combining all segments into one big segment before calling buffer.BufferAsync. The voice is there now but has some minor distortions.

                    var combinedData = new List<float>();
                    // Write buffers of segments that are buffered while inactive state just before activation.
                    while (activationQueue.TryDequeue(out var queued) && !cancellationToken.IsCancellationRequested)
                    {
                        combinedData.AddRange(queued.Buffer);
                        queued.Dispose();
                    }
                    var combinedSegment = new VoiceSegment(combinedData.ToArray(), this.source.SamplingRate, this.source.Channels);
                    await this.buffer.BufferAsync(combinedSegment, cancellationToken);
                    combinedSegment.Dispose();