Closed xvdp closed 7 months ago
I recorded my own .wav for the text prompt and ran it and I got something sounding "kind of like my voice" but as if inside a glass jar with all the words mangled
after cloning your code, installing the required dependencies , making symlink recording "But even the unsuccessful dramatist has his moments. to /home/data/Language/7176_92135_000004_000000.wav
and running this command :
sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" \
--infer_expt_dir ckpts/tts/valle_libritts \
--infer_output_dir $OUT_DIR \
--infer_mode "single" \
--infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." \
--infer_text_prompt "But even the unsuccessful dramatist has his moments." \
--infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
If I look at the log I see a couple lines that may be the problem?
WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1) WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1
appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
Exprimental Configuration File: ckpts/tts/valle_libritts/args.json
Text: This is a clip of generated speech with the given text from Amphion Vall-E model.
The following values were not passed to accelerate launch
and had defaults used instead:
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
DEBUG:matplotlib:matplotlib data path: /opt/conda/lib/python3.9/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/appuser/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:matplotlib:CACHEDIR=/home/weights/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /home/weights/matplotlib/fontlist-v330.json
Namespace(config='ckpts/tts/valle_libritts/args.json', dataset=None, testing_set='test', test_list_file='None', speaker_name=None, text='This is a clip of generated speech with the given text from Amphion Vall-E model.', vocoder_dir=None, acoustics_dir='ckpts/tts/valle_libritts', checkpoint_path=None, mode='single', log_level='debug', pitch_control=1.0, energy_control=1.0, duration_control=1.0, output_dir='/home/data/Language/VallE', text_prompt='But even the unsuccessful dramatist has his moments.', audio_prompt='/home/data/Language/7176_92135_000004_000000.wav', top_k=-100, temperature=1.0, continual=False, copysyn=False, ref_audio='', device='cuda', inference_step=200)
INFO:inference:========================================================
INFO:inference:|| New inference process started. ||
INFO:inference:========================================================
INFO:inference:
DEBUG:inference:Acoustic model dir: ckpts/tts/valle_libritts DEBUG:inference:Setting random seed done in 0.28ms DEBUG:inference:Random seed: 10086 INFO:inference:Building model... INFO:inference:Building model done in 607.009ms INFO:inference:Initializing accelerate... INFO:inference:Initializing accelerate done in 242.029ms INFO:inference:Loading checkpoint... INFO:accelerate.accelerator:Loading states from ckpts/tts/valle_libritts/checkpoint/final_epoch-0100_step-0837900_loss-3.883116 INFO:accelerate.checkpointing:All model weights loaded successfully INFO:accelerate.checkpointing:All optimizer states loaded successfully INFO:accelerate.checkpointing:All scheduler states loaded successfully INFO:accelerate.checkpointing:All dataloader sampler states loaded successfully INFO:accelerate.checkpointing:Could not load random states INFO:accelerate.accelerator:Loading in 0 custom states INFO:inference:Loading checkpoint done in 537.945ms /opt/conda/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1) WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1) Saved to: /home/data/Language/VallE/single (base) appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
Driver Version: 535.129.03 CUDA Version: 12.2 torch.version '2.1.2' running insider docker container
Thank you for your feedback. You can double-check on the prompt examples we provided.
Hi @xvdp, if you have any further questions, feel free to re-open this issue. We are glad to follow up!
Describe the bug
Examples listed in the egs/tts/VALLE/README.md fail
egs/tts/VALLE/prompt_examples/7176_92135_000004_000000.wav
is missing prompt_examples only contain txt filesHow To Reproduce
follow https://huggingface.co/amphion/valle_libritts