magenta / ddsp-vst

Realtime DDSP Neural Synthesizer and Effect
Apache License 2.0
702 stars 68 forks source link

[Feature Request] REST API for this project? #40

Open mepc36 opened 1 year ago

mepc36 commented 1 year ago

Is your feature request related to a problem? Please describe. I'd like programmatic access to the DDSP-VST's ability to morph one instrument into another.

Describe the solution you'd like I want to be able to send an audio file to a REST API, along with a JSON request that includes a model identifier. This would tell the API to transform the audio file's instrument into the instrument of the model identifier.

The response would be a binary file of that transformation applied to the initial audio file.

Describe alternatives you've considered I could crack open the DDSP-VST in some ugly way, but I don't know C++.

Additional context N/A.

mepc36 commented 1 year ago

Is there another Magenta GitHub project that I could create a wrapper around to create such an API?

mepc36 commented 1 year ago

I think using this repo...

https://github.com/magenta/ddsp

...and this notebook...

https://colab.research.google.com/github/magenta/ddsp/blob/main/ddsp/colab/demos/timbre_transfer.ipynb#scrollTo=wmSGDWM5yyjm

...could work to create an API.

mepc36 commented 1 year ago

Sorry, by "API", I really meant a web service.

Here is the code I'm using to do this timbre transfer in the browser:

                <IonButton onClick={async () => {
                  const Magenta = (await import('@magenta/music'))
                  const { DDSP, SPICE } = Magenta
                  const spice = new SPICE()
                  await spice.initialize()

                  // const audioUrl = 'https://s3-us-west-2.amazonaws.com/s.cdpn.io/123941/Yodel_Sound_Effect.mp3'; // yodel

                  const context = new AudioContext();

                  window.fetch(audioUrl)
                    .then(async response => {
                      const arrayBuffer = await response.arrayBuffer()
                      return arrayBuffer
                    })
                    .then(async arrayBuffer => {
                      const audioBuffer = await context.decodeAudioData(arrayBuffer)
                      return audioBuffer
                    })
                    .then(async audioBuffer => {
                      const audioFeatures = await spice.getAudioFeatures(audioBuffer)
                      const checkpointUrl = 'https://storage.googleapis.com/magentadata/js/checkpoints/ddsp/violin'
                      const ddsp = new DDSP(checkpointUrl)
                      await ddsp.initialize()
                      const synthesizedBuffer = await ddsp.synthesize(audioFeatures)

                      // Copied from ~/node_modules/@magenta/music/esm/ddsp/buffer_utils.js, which is
                      // called by the addReverb method in node_modules/@magenta/music/esm/ddsp/add_reverb.js,
                      // which is called by the .synthesize method in node_modules/@magenta/music/esm/ddsp/model.js
                      const arrayBufferToAudioBuffer = (audioCtx, arrayBuffer, sampleRate) => {
                        const newBuffer = audioCtx.createBuffer(1, arrayBuffer.length, sampleRate);
                        newBuffer.copyToChannel(arrayBuffer, 0);
                        return newBuffer;
                      };
                      console.log('synthesizedBuffer:', synthesizedBuffer)

                      const synthesizedAudioBuffer = arrayBufferToAudioBuffer(context, synthesizedBuffer, 48000)

                      function play(myBuffer) {
                        const source = context.createBufferSource();
                        source.buffer = myBuffer;
                        source.connect(context.destination);
                        source.start();
                      }

                      play(synthesizedAudioBuffer)
                    });
                }}>
                  <IonIcon icon={volumeHigh} />
                </IonButton>
olaviinha commented 1 year ago

I would like to be able to use the trained VST models programmatically in Python as per gen 1 DDSP autoencoder models with the outdated timbre transfer notebook.

The old timbre transfer demo notebook or autoencoder training notebook have been outdated for months and appear to be no longer maintained. VST training notebook works, but you can only use the trained models with the VST plugin that produces low quality mono audio.

I find upsampling, restoring high end and producing stereo output with DDSP is much quicker and simpler to do programmatically. The VST plugin is certainly cool and all, but for me it only adds a number of extra steps to the workflow.