Music Mixing Style Transfer

This repository includes source code and pre-trained models of the work Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects by Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, and Yuki Mitsufuji.

Hugging Face

You can now run inference with your samples (or from YouTube) at Hugging Face!

Pre-trained Models

Model	Configuration	Training Dataset
FXencoder (Φ_p.s.)	Used FX normalization and probability scheduling techniques for training	Trained with MUSDB18 Dataset
MixFXcloner	Mixing style converter trained with Φ_p.s.	Trained with MUSDB18 Dataset

Installation

pip install -r "requirements.txt"

Inference

Mixing Style Transfer

To run the inference code for mixing style transfer,

Download pre-trained models above and place them under the folder named 'weights' (default)

Prepare input and reference tracks under the folder named 'samples/style_transfer' (default) Target files should be organized as follow:

"path_to_data_directory"/"song_name_#1"/"input_file_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name".wav
...
"path_to_data_directory"/"song_name_#n"/"input_file_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name".wav

Run 'inference/style_transfer.py'

python inference/style_transfer.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--ckpt_path_conv "path_to_checkpoint_of_MixFXcloner" \
--target_dir "path_to_directory_containing_inference_samples"

Outputs will be stored under the same folder to inference data directory (default)

Note: The system accepts WAV files of stereo-channeled, 44.1kHZ, and 16-bit rate. We recommend to use audio samples that are not too loud: it's better for the system to transfer these samples by reducing the loudness of mixture-wise inputs (maintaining the overall balance of each instrument).

Interpolation With 2 Different Reference Tracks

Inference code for two reference tracks is almost the same as mixing style transfer.

Download pre-trained models above and place them under the folder named 'weights' (default)

Prepare input and 2 reference tracks under the folder named 'samples/style_transfer' (default) Target files should be organized as follow:

"path_to_data_directory"/"song_name_#1"/"input_track_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name".wav
"path_to_data_directory"/"song_name_#1"/"reference_file_name_2interpolate".wav
...
"path_to_data_directory"/"song_name_#n"/"input_track_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name".wav
"path_to_data_directory"/"song_name_#n"/"reference_file_name_2interpolate".wav

Run 'inference/style_transfer.py'

python inference/style_transfer.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--ckpt_path_conv "path_to_checkpoint_of_MixFXcloner" \
--target_dir "path_to_directory_containing_inference_samples" \
--interpolation True \
--interpolate_segments "number of segments to perform interpolation"

Outputs will be stored under the same folder to inference data directory (default)

Note: This example of interpolating 2 different reference tracks is not mentioned in the paper, but this example implies a potential for controllable style transfer using latent space.

Feature Extraction Using FXencoder

This inference code will extracts audio effects-related embeddings using our proposed FXencoder. This code will process all the .wav files under the target directory.

Download FXencoder's pre-trained model above and place it under the folder named 'weights' (default)=

Run 'inference/style_transfer.py'

python inference/feature_extraction.py \
--ckpt_path_enc "path_to_checkpoint_of_FXencoder" \
--target_dir "path_to_directory_containing_inference_samples"

Outputs will be stored under the same folder to inference data directory (default)

Implementation

All the details of our system implementation are under the folder "mixing_style_transfer".

FXmanipulator

-> mixing_style_transfer/mixing_manipulator/

network architectures

-> mixing_style_transfer/networks/

configuration of each sub-networks

-> mixing_style_transfer/networks/configs.yaml

data loader

-> mixing_style_transfer/data_loader/

Citation

Please consider citing the work upon usage.

@article{koo2022music,
  title={Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects},
  author={Koo, Junghyun and Martinez-Ramirez, Marco A and Liao, Wei-Hsiang and Uhlich, Stefan and Lee, Kyogu and Mitsufuji, Yuki},
  journal={arXiv preprint arXiv:2211.02247},
  year={2022}
}

jhtonyKoo / music_mixing_style_transfer

readme

Music Mixing Style Transfer

Hugging Face

Pre-trained Models

Installation

Inference

Mixing Style Transfer

Interpolation With 2 Different Reference Tracks

Feature Extraction Using FXencoder

Implementation

Citation