[BUG]: prompt_examples/*.wav missing

xvdp commented 8 months ago

Describe the bug

Examples listed in the egs/tts/VALLE/README.md fail

egs/tts/VALLE/prompt_examples/7176_92135_000004_000000.wav is missing prompt_examples only contain txt files

How To Reproduce

follow https://huggingface.co/amphion/valle_libritts

xvdp commented 8 months ago

I recorded my own .wav for the text prompt and ran it and I got something sounding "kind of like my voice" but as if inside a glass jar with all the words mangled

after cloning your code, installing the required dependencies , making symlink recording "But even the unsuccessful dramatist has his moments. to /home/data/Language/7176_92135_000004_000000.wav

and running this command :

 sh egs/tts/VALLE/run.sh  --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" \
--infer_expt_dir ckpts/tts/valle_libritts \
--infer_output_dir $OUT_DIR \
--infer_mode "single" \
--infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model."  \
--infer_text_prompt "But even the unsuccessful dramatist has his moments." \
--infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

If I look at the log I see a couple lines that may be the problem?

WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1) WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1

appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav Exprimental Configuration File: ckpts/tts/valle_libritts/args.json Text: This is a clip of generated speech with the given text from Amphion Vall-E model. The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. DEBUG:matplotlib:matplotlib data path: /opt/conda/lib/python3.9/site-packages/matplotlib/mpl-data DEBUG:matplotlib:CONFIGDIR=/home/appuser/.config/matplotlib DEBUG:matplotlib:interactive is False DEBUG:matplotlib:platform is linux DEBUG:matplotlib:CACHEDIR=/home/weights/matplotlib DEBUG:matplotlib.font_manager:Using fontManager instance from /home/weights/matplotlib/fontlist-v330.json Namespace(config='ckpts/tts/valle_libritts/args.json', dataset=None, testing_set='test', test_list_file='None', speaker_name=None, text='This is a clip of generated speech with the given text from Amphion Vall-E model.', vocoder_dir=None, acoustics_dir='ckpts/tts/valle_libritts', checkpoint_path=None, mode='single', log_level='debug', pitch_control=1.0, energy_control=1.0, duration_control=1.0, output_dir='/home/data/Language/VallE', text_prompt='But even the unsuccessful dramatist has his moments.', audio_prompt='/home/data/Language/7176_92135_000004_000000.wav', top_k=-100, temperature=1.0, continual=False, copysyn=False, ref_audio='', device='cuda', inference_step=200) INFO:inference:======================================================== INFO:inference:|| New inference process started. || INFO:inference:======================================================== INFO:inference:

DEBUG:inference:Acoustic model dir: ckpts/tts/valle_libritts DEBUG:inference:Setting random seed done in 0.28ms DEBUG:inference:Random seed: 10086 INFO:inference:Building model... INFO:inference:Building model done in 607.009ms INFO:inference:Initializing accelerate... INFO:inference:Initializing accelerate done in 242.029ms INFO:inference:Loading checkpoint... INFO:accelerate.accelerator:Loading states from ckpts/tts/valle_libritts/checkpoint/final_epoch-0100_step-0837900_loss-3.883116 INFO:accelerate.checkpointing:All model weights loaded successfully INFO:accelerate.checkpointing:All optimizer states loaded successfully INFO:accelerate.checkpointing:All scheduler states loaded successfully INFO:accelerate.checkpointing:All dataloader sampler states loaded successfully INFO:accelerate.checkpointing:Could not load random states INFO:accelerate.accelerator:Loading in 0 custom states INFO:inference:Loading checkpoint done in 537.945ms /opt/conda/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1) WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1) Saved to: /home/data/Language/VallE/single (base) appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

Driver Version: 535.129.03 CUDA Version: 12.2 torch.version '2.1.2' running insider docker container

lmxue commented 7 months ago

Thank you for your feedback. You can double-check on the prompt examples we provided.

RMSnow commented 7 months ago

Hi @xvdp, if you have any further questions, feel free to re-open this issue. We are glad to follow up!

open-mmlab / Amphion

[BUG]: prompt_examples/*.wav missing #114

Describe the bug

How To Reproduce