phetsims / tambo

library containing code to support sonification of PhET simulations
MIT License
2 stars 4 forks source link

Add support for "ducking", i.e. reducing output levels so that other things can be better heard #172

Closed jbphet closed 1 year ago

jbphet commented 1 year ago

There have been several requests for being able to turn down sound (meaning non-voice sounds) so that other things (usually, but not always, voice or description) can be more easily heard. This issue is intended to be the parent issue for the implementation of the first version of that feature.

There are several open issues related to this, but they are a bit less general. These will be left open, at least initially, and updated as this feature progresses. Here is a list:

jbphet commented 1 year ago

I did some research on how ducking works generally in the audio engineering world, and the most common implementation is to use something called "side chain compression". Here is an article on the topic, but the general idea is to turn down the volume of one signal (such as background music) when another signal (such as voice over) begins.

We most likely won't be able to implement true sidechain compression using Web Speech and Web Audio for a couple of reasons. One is that it doesn't seem to currently be possible for Web Audio to access the actual Web Speech audio output. Another is that the Web Audio DynamicCompressorNode does not yet have support for sidechaining. So, we will probably fake this out in some way.

In a discussion with @jessegreenberg, we agreed that it is probably best to have the tambo output sounds duck based on what is being produced by the SpeechSynthesisAnnouncer at any given moment rather than just turning down the sound in general whenever the voicing feature is enabled. @jessegreenberg says that there already exists trustworthy emitters in SpeechSynthesisAnnouncer for when speech starts and stops being produced. That seems like it could enable us to do something similar to sidechain-compression-based ducking.

We should note that we are only talking about the interaction between tambo-based sound production and voicing here, and are not working on anything related to description as spoken by a screen reader. When a screen reader is being used, our code has no knowledge of it, so we can't really implement ducking in conjunction with that.

jbphet commented 1 year ago

@jessegreenberg and I met today and discussed a number of possibilities for this, some of which are already described in the comment immediately above. Here are some others:

After this discussion, we decided that it would be worthwhile to try the simplest thing first, which would be to automatically duck sounds when the voicing is active. We reviewed the behavior of the Quadrilateral sim, and it should be able to serve as a good test vehicle for this. I will implement a feature in tambo's soundManager that will support the addition of a ducking property and will duck sounds when it is active, and I'll hook it up in the audioManager in joist. If it works reasonably well, we will demo it to the Quad design team, get feedback, and go from there.

jbphet commented 1 year ago

Over Slack, @jessegreenberg said:

Hey, just spoke with [@zepumph] - Sorry, I forgot he is hoping to take RC SHAs for friction in the next week or so and that sim has sound + voicing. Would you mind working on ducking in branches until then?

So, for now, the work on this feature should be done in branches.

zepumph commented 1 year ago

I appreciate it! Thanks for helping me keep master stable as we finish off the year.

jbphet commented 1 year ago

I've implemented an initial prototype using branches for joist and tambo. I published a dev version of quad at https://phet-dev.colorado.edu/html/quadrilateral/1.0.0-dev.70/phet/quadrilateral_en_phet.html. I'll have @jessegreenberg check it out, and he may demo it at the next quad design meeting.

I have a couple of ideas for potential improvements:

jbphet commented 1 year ago

The branches from joist and tambo have just been merged into master. I'll continue to get feedback on the feature and make refinements, but the basic functionality now exists on the master branch.

jbphet commented 1 year ago

@Ashton-Morris and I discussed this at today's sound design meeting, and he is up for reviewing it and providing feedback on the amount of sound reduction, the rate at which the sounds are turned down (commonly referred to as "attack" when discussing sidechain compression), and the rate at which sounds are turned back up (commonly referred to as release). So far I've tested it in quadrilateral, john-travoltage, and friction. There may be other sims that have both voicing and sound enabled where this could be test driven.

Ashton-Morris commented 1 year ago

From what I can tell the sounds and attack sounds good to me. Closing