open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 381 forks source link

[BUG]: prompt_examples/*.wav missing #114

Closed xvdp closed 7 months ago

xvdp commented 8 months ago

Describe the bug

Examples listed in the egs/tts/VALLE/README.md fail

egs/tts/VALLE/prompt_examples/7176_92135_000004_000000.wav is missing prompt_examples only contain txt files

How To Reproduce

follow https://huggingface.co/amphion/valle_libritts

xvdp commented 8 months ago

I recorded my own .wav for the text prompt and ran it and I got something sounding "kind of like my voice" but as if inside a glass jar with all the words mangled

after cloning your code, installing the required dependencies , making symlink recording "But even the unsuccessful dramatist has his moments. to /home/data/Language/7176_92135_000004_000000.wav

and running this command :

 sh egs/tts/VALLE/run.sh  --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" \
--infer_expt_dir ckpts/tts/valle_libritts \
--infer_output_dir $OUT_DIR \
--infer_mode "single" \
--infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model."  \
--infer_text_prompt "But even the unsuccessful dramatist has his moments." \
--infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

If I look at the log I see a couple lines that may be the problem?

WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1) WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1

appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav Exprimental Configuration File: ckpts/tts/valle_libritts/args.json Text: This is a clip of generated speech with the given text from Amphion Vall-E model. The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. DEBUG:matplotlib:matplotlib data path: /opt/conda/lib/python3.9/site-packages/matplotlib/mpl-data DEBUG:matplotlib:CONFIGDIR=/home/appuser/.config/matplotlib DEBUG:matplotlib:interactive is False DEBUG:matplotlib:platform is linux DEBUG:matplotlib:CACHEDIR=/home/weights/matplotlib DEBUG:matplotlib.font_manager:Using fontManager instance from /home/weights/matplotlib/fontlist-v330.json Namespace(config='ckpts/tts/valle_libritts/args.json', dataset=None, testing_set='test', test_list_file='None', speaker_name=None, text='This is a clip of generated speech with the given text from Amphion Vall-E model.', vocoder_dir=None, acoustics_dir='ckpts/tts/valle_libritts', checkpoint_path=None, mode='single', log_level='debug', pitch_control=1.0, energy_control=1.0, duration_control=1.0, output_dir='/home/data/Language/VallE', text_prompt='But even the unsuccessful dramatist has his moments.', audio_prompt='/home/data/Language/7176_92135_000004_000000.wav', top_k=-100, temperature=1.0, continual=False, copysyn=False, ref_audio='', device='cuda', inference_step=200) INFO:inference:======================================================== INFO:inference:|| New inference process started. || INFO:inference:======================================================== INFO:inference:

DEBUG:inference:Acoustic model dir: ckpts/tts/valle_libritts DEBUG:inference:Setting random seed done in 0.28ms DEBUG:inference:Random seed: 10086 INFO:inference:Building model... INFO:inference:Building model done in 607.009ms INFO:inference:Initializing accelerate... INFO:inference:Initializing accelerate done in 242.029ms INFO:inference:Loading checkpoint... INFO:accelerate.accelerator:Loading states from ckpts/tts/valle_libritts/checkpoint/final_epoch-0100_step-0837900_loss-3.883116 INFO:accelerate.checkpointing:All model weights loaded successfully INFO:accelerate.checkpointing:All optimizer states loaded successfully INFO:accelerate.checkpointing:All scheduler states loaded successfully INFO:accelerate.checkpointing:All dataloader sampler states loaded successfully INFO:accelerate.checkpointing:Could not load random states INFO:accelerate.accelerator:Loading in 0 custom states INFO:inference:Loading checkpoint done in 537.945ms /opt/conda/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1) WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1) Saved to: /home/data/Language/VallE/single (base) appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

Driver Version: 535.129.03 CUDA Version: 12.2 torch.version '2.1.2' running insider docker container

lmxue commented 7 months ago

Thank you for your feedback. You can double-check on the prompt examples we provided.

RMSnow commented 7 months ago

Hi @xvdp, if you have any further questions, feel free to re-open this issue. We are glad to follow up!