opentok / opentok-windows-sdk-samples

Sample applications illustrating best practices using OpenTok Windows SDK
MIT License
9 stars 29 forks source link

Voice detection #50

Closed wesoos closed 1 year ago

wesoos commented 1 year ago

Hello, is it possible to detect that there has been no voice speaking for "x" amount of time?

sergioturgil-vonage commented 1 year ago

Hello, is it possible to detect that there has been no voice speaking for "x" amount of time?


EDIT: Forget about this answer. It does provide a solution but it's extremely convoluted. The simplest answer would be to use what @rktalusani commented. Use AudioLevel event, both for publisher and subscriber.


Yes with some coding. Also, the way depends on whether you mean for subscribers or for the publisher.

For subscribers it's much easier. You just need to subscribe to the AudioData event. This event is raised every 10ms and contains a batch of audio samples (samples are 2-byte signed integers). You would need to check that all samples are close enough to zero (some fine tuning would be needed to find out what close enough is to ignore a certain level of background noise).

For the publisher it's a bit more complicated since we do not provide that event. In that case you would need to provide a custom audio device. To follow this route you would need to familiarize yourself with creating a custom audio device. In the capturer subsystem, inside the callback to write the captured samples to webrtc, you would use the same operation described for subscribers before actually writing the samples.

There's also a second better alternative for the publisher which is not available yet but will be very soon. That is using insertable streams API (or media processing API if you prefer). This API allows configuring a IAudioTransformer with a transform callback that will be called for each batch of samples passed to webrtc (from a custom audio device or from the default audio module, it doesn't matter). Normally you would use this to perform some transformations, like noise cancellation and such, but you can also use it to make the calculations we explained before without actually performing any kind of transformation.

rktalusani commented 1 year ago

they can also use this right? - https://tokbox.com/developer/sdks/windows/reference/class_open_tok_1_1_publisher.html#ac0388cd417ccd21b50764898397b0b9e

sergioturgil-vonage commented 1 year ago

they can also use this right? - https://tokbox.com/developer/sdks/windows/reference/class_open_tok_1_1_publisher.html#ac0388cd417ccd21b50764898397b0b9e

Yeah, you're absolutely right. That would be the simples solution. I can't believe I missed that one and gave the most convoluted solution possible. In fact I'm going to edit my answer to redirect people to your answer which is far better. Thx! (and apologies)

wesoos commented 1 year ago

Are you guys referring to the AudioStatsUpdated event?

sergioturgil-vonage commented 1 year ago

No, there's an AudioLevel event both for publisher and subscriber. You can subscribe to it and I believe it's raised periodically with a float that indicates the audio level at that moment. It's a much simpler solution than what I proposed. I don't know why I didn't think of it first.

wesoos commented 1 year ago

So AudioLevel is not the level of the MIC or speaker?

sergioturgil-vonage commented 1 year ago

It's the level of audio measured in the publisher or subscriber audio stream.

It's not really the level of audio captured from the mic or rendered in the speaker but I believe that for most applications it will suffice, right?

If not, let me know your exact use case and we can think of a solution.

wesoos commented 1 year ago

Thanks. Essentially we want to detect if a participant is still on the call so we can end the call if they “forgot” to hangup, if that makes sense.

I was also thinking to use the AudioNetworkStats and look at the BytesSent property…

sergioturgil-vonage commented 1 year ago

No, without compression the bytes sent remain the same, even if all of them represent samples of 0 amplitude.

I imagine you're talking about a participant that is still in the call (haven't invoked unpublish/disconnect on their side) but they are not talking at all and haven't talked for a while.

In that case I would check that the subscriber.AudioLevel event has reported 0 (or close to 0) for X time. Since this is a very easy to implement solution so I would try this first and see if it works for you.

Have in mind that this will also be true for any participant that is currently in silence or has switched of the microphone because they're just listening.

wesoos commented 7 months ago

Hello, quick question on this again, will this work if there is background noise? Wind blowing or cars driving by, etc...? Also, if I dispose a session from the c# client, will it end the entire session on the other side as well? The reason for this whole exercise is that some clients are keeping their side open and not closing out, running the participant minutes through the roof...