met4citizen / TalkingHead

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
MIT License
349 stars 107 forks source link

Volume for head.speakAudio()? #22

Closed JPhilipp closed 7 months ago

JPhilipp commented 7 months ago

I'm exclusively using the non-TTS speakAudio with a transcript file (works great!). Is there any way to set the volume? Cheers!

On a side note, what I'd really want is for the background music to smoothly get a lowpass filter and lower its gain while the avatar speaks, in order to better hear them. The approach I used so far for that is below. It sounds epic when the audio fades back in after the speech, and I'd love to wire this up to the avatar.

export class AudioPlayerWithLowpassFilter {

  constructor(musicPath, speechPath) {
    this.audioContext = new AudioContext();
    this.lowpassFilter = this.audioContext.createBiquadFilter();
    this.lowpassFilter.type = 'lowpass';

    const frequencyInHerz = 600;
    this.lowpassFilter.frequency.value = frequencyInHerz;

    this.gainNode = this.audioContext.createGain();

    this.musicVolumeMin = 0.55;
    this.musicVolumeMax = 0.7;

    this.musicPath = musicPath;
    this.speechPath = speechPath;
    this.musicSource = null;
    this.speechSource = null;
    this.isPlaying = false;

    if (!this.speechPath) {
      this.musicVolumeMin -= 0.4;
      this.musicVolumeMax = this.musicVolumeMin;
    }

    this.gainNode.gain.value = this.musicVolumeMin;
  }

  async loadAudio(url) {
    const response = await fetch(url);
    const arrayBuffer = await response.arrayBuffer();
    return this.audioContext.decodeAudioData(arrayBuffer);
  }

  async setupSources() {
    const musicBuffer = await this.loadAudio(this.musicPath);
    this.musicSource = this.audioContext.createBufferSource();
    this.musicSource.buffer = musicBuffer;
    this.musicSource.loop = true;

    if (this.speechPath) {
      const speechBuffer = await this.loadAudio(this.speechPath);
      this.speechSource = this.audioContext.createBufferSource();
      this.speechSource.buffer = speechBuffer;
    }

    this.musicSource.connect(this.lowpassFilter).connect(this.gainNode).connect(this.audioContext.destination);

    if (this.speechPath) {
      this.speechSource.connect(this.audioContext.destination);

      this.speechSource.onended = () => {
        this.removeLowpassFilter();
      };
    }
  }

  removeLowpassFilter() {
    if (this.musicSource && this.lowpassFilter) {
      const currentTime = this.audioContext.currentTime;

      const startFrequency = this.lowpassFilter.frequency.value;
      const endFrequency = 20000;
      const rampDuration = 4;

      this.lowpassFilter.frequency.setValueAtTime(startFrequency, currentTime);
      this.lowpassFilter.frequency.linearRampToValueAtTime(endFrequency, currentTime + rampDuration);

      this.gainNode.gain.setValueAtTime(this.gainNode.gain.value, currentTime);
      this.gainNode.gain.linearRampToValueAtTime(this.musicVolumeMax, currentTime + rampDuration);
    }
  }

  async play() {
    if (!this.isPlaying) {
      await this.setupSources();
      this.musicSource.start();
      if (this.speechPath) { this.speechSource.start(); }
      this.isPlaying = true;
    }
  }

  stop() {
    if (this.isPlaying) {
      this.musicSource.stop();
      if (this.speechPath) { this.speechSource.stop(); }
      this.isPlaying = false;
    }
  }

  stopAudioSourcePromise(audioSource) {
    return new Promise((resolve, reject) => {
      try {
        if (audioSource && audioSource.context.state === 'running') {
          audioSource.stop();
        }
        resolve();
      } catch (error) {
        reject(error);
      }
    });
  }

  async stopAsync() {
    if (this.speechPath) { 
      await Promise.all([
        this.stopAudioSourcePromise(this.musicSource),
        this.stopAudioSourcePromise(this.speechSource)
      ]);
    }
    else {
      await this.stopAudioSourcePromise(this.musicSource);
    }
    this.isPlaying = false;
  }

  async setFiles(musicPath, speechPath) {
    if (this.isPlaying) {
      this.stop();
    }
    this.musicPath = musicPath;
    this.speechPath = speechPath;
  }

}
met4citizen commented 7 months ago

The TalkingHead class has two audio sources, one for speech and one for background. The speech source is connected to a speech gain node, and background to a background gain node. Both gain nodes are then connected to a reverb node, which is connected to the destination. Gain values can be set with the setMixerGain(speech, background) method.

The background audio can be played with playBackgroundAudio(url), but (as far as I know) you can simply use your own background music setup and only control the speech gain, if needed, e.g. head.setMixerGain(0.5,null).

JPhilipp commented 7 months ago

Cool, thanks!

JPhilipp commented 7 months ago

Is there maybe an "onFinishedAudio" type event one can hook into to set one's own background music volume etc.?

met4citizen commented 7 months ago

Sure, once you have called speakAudio, you can add a new marker to the speech queue with speakMarker(onmarker). The callback function onmarker gets called when the audio has finished playing. Alternatively, you can embed timed markers into the speakAudio itself by using arrays markers and mtimes as shown in the mp3.html example.

JPhilipp commented 7 months ago

Thanks!