First, install the required packages.
pip install -r requirements.txt
Then, run the training script.
python train.py name=NAME_OF_OUTPUT data.raw_dir=PATH/TO/SOUNDS/DIRECTORY data.db_path=PATH/TO/TEMPORARY/DATABASE/DIRECTORY
The results including checkpoint files (.ckpt) and tensorboard logs are saved under logs/{name}
.
data.raw_dir
specifies the directory of the folder that contain the audio files. data.db_path
specifies the directory of the temporary database files (see Details/Preprocessing section below).
After training, you can export the checkpoint file to neutone model (.nm) file.
python export.py CKPT_FILE EXPORT_NAME
The model.nm file along with reconstruction samples will be outputted to exports/EXPORT_NAME
. You can load these files in the neutone plugin. Make sure to fill in the model details in DDSPModelWrapper
when submitting these models to neutone.
This project uses hydra to manage configurations. Basically, the config files are split into multiple files under configs/
. For example, the default data configuration is specified in configs/model/slice.yaml
. You can then override these settings by changing config/config.yaml
, and can also be overwritten in command line like data.raw_dir=PATH/TO/.WAV/DIRECTORY
.
Here are some extra arguments you might want to edit:
trainer.max_steps
steps.ckpt_nsteps
steps.ckpt=data/pretrain_sawnoise.ckpt
) may produce better extrapolation.data.py
)It will first check if the database file has already been created under data.db_path
. If a database file already exists, it will use that database file. If not, it will load the wave files in the directory specified by data.raw_dir
, and perform preprocessing. The audio file is cut into 1 second segments and its pitch is detected using torchcrepe. The preprocessing results including the sliced audio and pitch is saved into a database file under data.db_path
.
model.py
, estimator.py
)The model is the same as a basic DDSP model. An estimator network is trained to predict the parameters for the DDSP synthesizer. The DDSP synthesizer outputs audio from the estimated parameters. The outputs of the DDSP synthesizer is used to calculate the multi-scale spectrogram loss from the original input. For details about the DDSP model, see Google Magenta's blog post.
synthesizer.py
, processor.py
, modules/
)The synthesizer architecture can be flexibly constructed as a yaml file. For example, the default synthesizer config configs/synth/hpnir.yaml
instantiates Harmonic (harmonic synthesizer), FilteredNoise (noise synthesizer), and IRReverb (convolution reverb) modules and specifies the connections for each modules.
The reverb module (reverb.py
, IRReverb
) is a convolution reverb. This IR is fixed over the entire dataset and is learned as a model parameter.
stream.py
, export.py
)The original DDSP modules are not compatible for streaming synthesis, as they rely on future information for interpolation, etc. export.py
converts the model into a streaming compatible model using caches. For pitch detection, since the CREPE model only supports a single sample rate (16kHz), we instead use YIN.
This project is based on the original DDSP paper by Engel et al. This project also uses the Lightning-Hydra-Template (MIT License).