hyakuchiki / realtimeDDSP

Realtime (streaming) DDSP in PyTorch compatible with neutone
GNU Lesser General Public License v2.1
48 stars 3 forks source link
audio pytorch

Realtime DDSP in PyTorch + Export to neutone

Features

Usage

First, install the required packages.

pip install -r requirements.txt

Then, run the training script.

python train.py name=NAME_OF_OUTPUT data.raw_dir=PATH/TO/SOUNDS/DIRECTORY data.db_path=PATH/TO/TEMPORARY/DATABASE/DIRECTORY

The results including checkpoint files (.ckpt) and tensorboard logs are saved under logs/{name}. data.raw_dir specifies the directory of the folder that contain the audio files. data.db_path specifies the directory of the temporary database files (see Details/Preprocessing section below).

After training, you can export the checkpoint file to neutone model (.nm) file.

python export.py CKPT_FILE EXPORT_NAME

The model.nm file along with reconstruction samples will be outputted to exports/EXPORT_NAME. You can load these files in the neutone plugin. Make sure to fill in the model details in DDSPModelWrapper when submitting these models to neutone.

Arguments

This project uses hydra to manage configurations. Basically, the config files are split into multiple files under configs/. For example, the default data configuration is specified in configs/model/slice.yaml. You can then override these settings by changing config/config.yaml, and can also be overwritten in command line like data.raw_dir=PATH/TO/.WAV/DIRECTORY.

Here are some extra arguments you might want to edit:

Tips

Details

Preprocessing (data.py)

It will first check if the database file has already been created under data.db_path. If a database file already exists, it will use that database file. If not, it will load the wave files in the directory specified by data.raw_dir, and perform preprocessing. The audio file is cut into 1 second segments and its pitch is detected using torchcrepe. The preprocessing results including the sliced audio and pitch is saved into a database file under data.db_path.

Model (model.py, estimator.py)

The model is the same as a basic DDSP model. An estimator network is trained to predict the parameters for the DDSP synthesizer. The DDSP synthesizer outputs audio from the estimated parameters. The outputs of the DDSP synthesizer is used to calculate the multi-scale spectrogram loss from the original input. For details about the DDSP model, see Google Magenta's blog post.

DDSP Synthesizer (synthesizer.py, processor.py, modules/)

The synthesizer architecture can be flexibly constructed as a yaml file. For example, the default synthesizer config configs/synth/hpnir.yaml instantiates Harmonic (harmonic synthesizer), FilteredNoise (noise synthesizer), and IRReverb (convolution reverb) modules and specifies the connections for each modules.

The reverb module (reverb.py, IRReverb) is a convolution reverb. This IR is fixed over the entire dataset and is learned as a model parameter.

Streaming (stream.py, export.py)

The original DDSP modules are not compatible for streaming synthesis, as they rely on future information for interpolation, etc. export.py converts the model into a streaming compatible model using caches. For pitch detection, since the CREPE model only supports a single sample rate (16kHz), we instead use YIN.

Credits

This project is based on the original DDSP paper by Engel et al. This project also uses the Lightning-Hydra-Template (MIT License).