Closed yiwei0730 closed 5 months ago
I'm gonna write steps for train, and inference after training this weekend.
but briefly, you can install dependencies with this
conda create -n pflow-encodec -y python=3.10
conda activate pflow-encodec
conda install -y pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y -c conda-forge libsndfile==1.0.31
pip install -r requirements.txt
and prepare your dataset tsv files like below, three columns are required (audio_path, duration, text)
audio_path duration text
/path/to/audio1 duration_of_audio1 text1
/path/to/audio2 duration_of_audio2 text2
...
then run scripts/dump_latents.py, scripts/dump_durations.py. this script will dump out encodec latent and character duration.
after run dump_latents, global mean and std will be printed out. you should use this value at config like here
then configure your experiment in configs/experiment folder. config is based on hydra.
you can run your experiment python pflow_encodec/train.py experiment=<experiment name>
Thank you for your Thank you for your reply and I will look forward to your more detailed introduction after the weekend. I would like to ask, decode with MultiBand-Diffusion model (can you tell me where the code is? I can’t seem to find it), and whether the encodec can be tried with a better encodec, such as the latest FAcodec.
@yiwei0730 you can find generation code in https://github.com/seastar105/pflow-encodec/blob/main/notebooks/generate.ipynb
of course it can be used with any continuous representation (mel, dac, other vae).
I've updated README.
Hello, this is a very good project I would like to ask if you can write a training step for use. At present, I also want to try to use it in multiple languages (mandarin and english as the main axis)