training step - Githubissues

yiwei0730 commented 5 months ago

Hello, this is a very good project I would like to ask if you can write a training step for use. At present, I also want to try to use it in multiple languages (mandarin and english as the main axis)

seastar105 commented 5 months ago

I'm gonna write steps for train, and inference after training this weekend.

but briefly, you can install dependencies with this

conda create -n pflow-encodec -y python=3.10
conda activate pflow-encodec
conda install -y pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y -c conda-forge libsndfile==1.0.31
pip install -r requirements.txt

and prepare your dataset tsv files like below, three columns are required (audio_path, duration, text)

audio_path duration text
/path/to/audio1 duration_of_audio1 text1
/path/to/audio2 duration_of_audio2 text2
...

then run scripts/dump_latents.py, scripts/dump_durations.py. this script will dump out encodec latent and character duration.

after run dump_latents, global mean and std will be printed out. you should use this value at config like here

then configure your experiment in configs/experiment folder. config is based on hydra.

you can run your experiment python pflow_encodec/train.py experiment=<experiment name>

yiwei0730 commented 5 months ago

Thank you for your Thank you for your reply and I will look forward to your more detailed introduction after the weekend. I would like to ask, decode with MultiBand-Diffusion model (can you tell me where the code is? I can’t seem to find it), and whether the encodec can be tried with a better encodec, such as the latest FAcodec.

seastar105 commented 5 months ago

@yiwei0730 you can find generation code in https://github.com/seastar105/pflow-encodec/blob/main/notebooks/generate.ipynb

of course it can be used with any continuous representation (mel, dac, other vae).

seastar105 commented 5 months ago

I've updated README.

seastar105 / pflow-encodec

training step #1