This repository includes source code and pre-trained models of the work Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects by Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, and Yuki Mitsufuji.
You can now run inference with your samples (or from YouTube) at Hugging Face!
Model | Configuration | Training Dataset |
---|---|---|
FXencoder (Φp.s.) | Used FX normalization and probability scheduling techniques for training | Trained with MUSDB18 Dataset |
MixFXcloner | Mixing style converter trained with Φp.s. | Trained with MUSDB18 Dataset |
pip install -r "requirements.txt"
To run the inference code for mixing style transfer,
"path_to_data_directory"/"song_name_#1"/"input_file_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name".wav
...
"path_to_data_directory"/"song_name_#n"/"input_file_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name".wav
python inference/style_transfer.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--ckpt_path_conv "path_to_checkpoint_of_MixFXcloner" \
--target_dir "path_to_directory_containing_inference_samples"
Note: The system accepts WAV files of stereo-channeled, 44.1kHZ, and 16-bit rate. We recommend to use audio samples that are not too loud: it's better for the system to transfer these samples by reducing the loudness of mixture-wise inputs (maintaining the overall balance of each instrument).
Inference code for
"path_to_data_directory"/"song_name_#1"/"input_track_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name_2interpolate".wav
...
"path_to_data_directory"/"song_name_#n"/"input_track_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name_2interpolate".wav
python inference/style_transfer.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--ckpt_path_conv "path_to_checkpoint_of_MixFXcloner" \
--target_dir "path_to_directory_containing_inference_samples" \
--interpolation True \
--interpolate_segments "number of segments to perform interpolation"
Note: This example of interpolating 2 different reference tracks is not mentioned in the paper, but this example implies a potential for controllable style transfer using latent space.
This inference code will extracts audio effects-related embeddings using our proposed FXencoder. This code will process all the .wav files under the target directory.
python inference/feature_extraction.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--target_dir "path_to_directory_containing_inference_samples"
All the details of our system implementation are under the folder "mixing_style_transfer".
-> mixing_style_transfer/mixing_manipulator/
-> mixing_style_transfer/networks/
-> mixing_style_transfer/networks/configs.yaml
-> mixing_style_transfer/data_loader/
Please consider citing the work upon usage.
@article{koo2022music,
title={Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects},
author={Koo, Junghyun and Martinez-Ramirez, Marco A and Liao, Wei-Hsiang and Uhlich, Stefan and Lee, Kyogu and Mitsufuji, Yuki},
journal={arXiv preprint arXiv:2211.02247},
year={2022}
}