Add support for "ducking", i.e. reducing output levels so that other things can be better heard

jbphet commented 1 year ago

There have been several requests for being able to turn down sound (meaning non-voice sounds) so that other things (usually, but not always, voice or description) can be more easily heard. This issue is intended to be the parent issue for the implementation of the first version of that feature.

There are several open issues related to this, but they are a bit less general. These will be left open, at least initially, and updated as this feature progresses. Here is a list:

jbphet commented 1 year ago

I did some research on how ducking works generally in the audio engineering world, and the most common implementation is to use something called "side chain compression". Here is an article on the topic, but the general idea is to turn down the volume of one signal (such as background music) when another signal (such as voice over) begins.

We most likely won't be able to implement true sidechain compression using Web Speech and Web Audio for a couple of reasons. One is that it doesn't seem to currently be possible for Web Audio to access the actual Web Speech audio output. Another is that the Web Audio DynamicCompressorNode does not yet have support for sidechaining. So, we will probably fake this out in some way.

In a discussion with @jessegreenberg, we agreed that it is probably best to have the tambo output sounds duck based on what is being produced by the SpeechSynthesisAnnouncer at any given moment rather than just turning down the sound in general whenever the voicing feature is enabled. @jessegreenberg says that there already exists trustworthy emitters in SpeechSynthesisAnnouncer for when speech starts and stops being produced. That seems like it could enable us to do something similar to sidechain-compression-based ducking.

We should note that we are only talking about the interaction between tambo-based sound production and voicing here, and are not working on anything related to description as spoken by a screen reader. When a screen reader is being used, our code has no knowledge of it, so we can't really implement ducking in conjunction with that.

jbphet commented 1 year ago

@jessegreenberg and I met today and discussed a number of possibilities for this, some of which are already described in the comment immediately above. Here are some others:

Tambo already supports arbitrary categories for the sounds. This feature is generally used to do things like turn down all the user interface sounds and leave the game sounds turned up when we don't have full sound design for a sim. Should we do the ducking on a category basis?
Should we have a capability to duck individual sound generators? This would be more complex to use and to implement, but it would give users fine-grained control, and would probably be used to duck things like loops and would leave other sounds, such as the Reset All button earcon, at full volume. Is that desirable (we just don't know). duck everything?
The simplest thing to do, at least for the first version, would be to set up our common code to duck all sound generation by a fixed amount whenever voicing is active, and unduck when it is not. If we do this, developers won't have to do anything to make it work. The downside will be that developers won't have any control over it.

After this discussion, we decided that it would be worthwhile to try the simplest thing first, which would be to automatically duck sounds when the voicing is active. We reviewed the behavior of the Quadrilateral sim, and it should be able to serve as a good test vehicle for this. I will implement a feature in tambo's soundManager that will support the addition of a ducking property and will duck sounds when it is active, and I'll hook it up in the audioManager in joist. If it works reasonably well, we will demo it to the Quad design team, get feedback, and go from there.

jbphet commented 1 year ago

Over Slack, @jessegreenberg said:

Hey, just spoke with [@zepumph] - Sorry, I forgot he is hoping to take RC SHAs for friction in the next week or so and that sim has sound + voicing. Would you mind working on ducking in branches until then?

So, for now, the work on this feature should be done in branches.

zepumph commented 1 year ago

I appreciate it! Thanks for helping me keep master stable as we finish off the year.

jbphet commented 1 year ago

I've implemented an initial prototype using branches for joist and tambo. I published a dev version of quad at https://phet-dev.colorado.edu/html/quadrilateral/1.0.0-dev.70/phet/quadrilateral_en_phet.html. I'll have @jessegreenberg check it out, and he may demo it at the next quad design meeting.

I have a couple of ideas for potential improvements:

Right now, the volume changes for ducking are symmetrical, meaning that the output level goes down and up at the same rate. After playing with it for a bit, I think it would be good if it went down more quickly, and came back up more slowly.
I think we might want it to duck to a slightly lower volume, but I'd like others to experiment with it first and see what they think.
I should probably add some sort of test harness to the tambo demo for this feature, but I'll wait until we decide for sure if it's a keeper.

jbphet commented 1 year ago

The branches from joist and tambo have just been merged into master. I'll continue to get feedback on the feature and make refinements, but the basic functionality now exists on the master branch.

jbphet commented 1 year ago

@Ashton-Morris and I discussed this at today's sound design meeting, and he is up for reviewing it and providing feedback on the amount of sound reduction, the rate at which the sounds are turned down (commonly referred to as "attack" when discussing sidechain compression), and the rate at which sounds are turned back up (commonly referred to as release). So far I've tested it in quadrilateral, john-travoltage, and friction. There may be other sims that have both voicing and sound enabled where this could be test driven.

Ashton-Morris commented 1 year ago

From what I can tell the sounds and attack sounds good to me. Closing

phetsims / tambo

Add support for "ducking", i.e. reducing output levels so that other things can be better heard #172