open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.83k stars 590 forks source link

[Help]: When using Valle_libritts pre -training model, the model failed to load the model correctly. #169

Closed song201216 closed 4 months ago

song201216 commented 8 months ago

Problem Overview

When using Valle_libritts pre -training model, the model failed to load the model correctly.

When I use the relevant instructions for your tutorial to reason, the display model is loaded error. I do n’t know how to solve it. I use instructions: --infer_expt_dir ckpts/tts/valle_libritts /checkpoint \ --infer_output_dir ckpts/tts/valle_libritts/result \ --infer_mode "single" \ --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." \ --infer_text_prompt "But even the unsuccessful dramatist has his moments." \ --infer_audio_prompt egs/tts/valle_libritts /prompt_examples/7176_92135_000004_000000.wav The error appearing is: 2024-03-28 14:45:44 | INFO | accelerate.accelerator | Loading states from ckpts/tts/valle_librilight_6k/checkpoint/epoch-0011_step-0435000_loss-3.656264 Traceback (most recent call last): File "/app/bins/tts/inference.py", line 167, in main() File "/app/bins/tts/inference.py", line 160, in main inferencer = build_inference(args, cfg) File "/app/bins/tts/inference.py", line 27, in build_inference inference = inference_class(args, cfg) File "/app/models/tts/valle/valle_inference.py", line 24, in init TTSInference.init(self, args, cfg) File "/app/models/tts/base/tts_inferece.py", line 108, in init self._load_model( File "/app/models/tts/base/tts_inferece.py", line 175, in _load_model self.accelerator.load_state(str(checkpoint_path)) File "/opt/conda/envs/amphion/lib/python3.9/site-packages/accelerate/accelerator.py", line 2962, in load_state load_accelerator_state( File "/opt/conda/envs/amphion/lib/python3.9/site-packages/accelerate/checkpointing.py", line 181, in load_accelerator_state models[i].load_state_dict(torch.load(input_model_file, map_location=map_location), load_model_func_kwargs) File "/opt/conda/envs/amphion/lib/python3.9/site-packages/torch/serialization.py", line 815, in load return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args) File "/opt/conda/envs/amphion/lib/python3.9/site-packages/torch/serialization.py", line 1033, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'. Traceback (most recent call last): File "/opt/conda/envs/amphion/bin/accelerate", line 8, in sys.exit(main()) File "/opt/conda/envs/amphion/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/opt/conda/envs/amphion/lib/python3.9/site-packages/accelerate/commands/launch.py", line 994, in launch_command simple_launcher(args) File "/opt/conda/envs/amphion/lib/python3.9/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/envs/amphion/bin/python', '/app/bins/tts/inference.py', '--config', 'ckpts/tts/valle_librilight_6k/args.json', '--log_level', 'debug', '--acoustics_dir', 'ckpts/tts/valle_librilight_6k', '--output_dir', 'ckpts/tts/valle_librilight_6k/result', '--mode', 'single', '--text', 'This is a clip of generated speech with the given text from Amphion Vall-E model.', '--text_prompt', 'But even the unsuccessful dramatist has his moments.', '--audio_prompt', 'egs/tts/valle_librilight_6k/prompt_examples/7176_92135_000004_000000.wav', '--test_list_file', 'None']' returned non-zero exit status 1.

Thank you very much for helping me, I will quote your thesis in my research

lmxue commented 8 months ago

The error message you encountered, specifically the _pickle.UnpicklingError: invalid load key, 'v'., suggests that there's an issue with the checkpoint file you're trying to load. Given the information and the context of your problem, here are a few steps you can take to try to resolve this issue:

song201216 commented 8 months ago

Thank you for your valuable advice. I was able to resolve the issue by reloading the model. However, a new problem has arisen where noise is present in the generated audio during reasoning. To minimize this noise generation, could you please suggest any parameters that can be adjusted during the reasoning process or provide alternative recommendations? I eagerly await your response.

lmxue commented 8 months ago

Hi @song201216, to better understand and address the issue, could you please share the command you used for inference, the speech prompt, and the resulting generated speech? This information will greatly assist us in diagnosing the problem more effectively.