modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
https://funcodec.github.io/
MIT License
342 stars 30 forks source link

LauraTTS: _pickle.UnpicklingError: invalid load key, 'v'. #28

Open HDUysz opened 5 months ago

HDUysz commented 5 months ago

Environment

Issue Description

I believe I have correctly installed the required PyTorch version as per the README instructions and have also executed pip install --editable ./ to install the necessary requirements. However, while trying to run the "Use LauraTTS to synthesize speech" example, executing the following command: bash demo.sh --stage 1 --model_name ${model_name} --output_dir results --text "nothing was to be done but to put about, and return in disappointment towards the north." I encountered the following error:

Traceback (most recent call last):
  File "/root/miniconda3/envs/lg/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/lg/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/autodl-tmp/FunCodec/funcodec/bin/text2audio_inference.py", line 561, in <module>
    main()
  File "/root/autodl-tmp/FunCodec/funcodec/bin/text2audio_inference.py", line 557, in main
    inference(**kwargs)
  File "/root/autodl-tmp/FunCodec/funcodec/bin/text2audio_inference.py", line 381, in inference
    inference_pipeline = inference_func(
  File "/root/autodl-tmp/FunCodec/funcodec/bin/text2audio_inference.py", line 287, in inference_func
    my_model = Text2Audio.from_pretrained(
  File "/root/autodl-tmp/FunCodec/funcodec/bin/text2audio_inference.py", line 227, in from_pretrained
    return Text2Audio(**kwargs)
  File "/root/autodl-tmp/FunCodec/funcodec/bin/text2audio_inference.py", line 53, in __init__
    model, model_args = Text2AudioGenTask.build_model_from_file(
  File "/root/autodl-tmp/FunCodec/funcodec/tasks/abs_task.py", line 1941, in build_model_from_file
    src_state = torch.load(model_file, map_location=device)
  File "/root/miniconda3/envs/lg/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/root/miniconda3/envs/lg/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

Question:

Is there an issue with how I am using the program? I am eager to experience your project and would greatly appreciate your guidance or suggestions for resolving this issue.

ZhihaoDU commented 5 months ago

Have you followed the README step by step ? You should first set: model_name="speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch" and then run the command: bash demo.sh --stage 1 --model_name ${model_name} --output_dir results --text "nothing was to be done but to put about, and return in disappointment towards the north."

This script will download models from ModelScope or Huggingface into exp/speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch and exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch. Are these directories existed ?

HDUysz commented 5 months ago

Have you followed the README step by step ? You should first set: model_name="speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch" and then run the command: bash demo.sh --stage 1 --model_name ${model_name} --output_dir results --text "nothing was to be done but to put about, and return in disappointment towards the north."

This script will download models from ModelScope or Huggingface into exp/speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch and exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch. Are these directories existed ?

Yeah, I did follow the README.md step by step. I've set themodel_name, and like you mentioned, I've got those two files. Also, there's a model.pth file in the folder.I took a screenshot of the contents in the folder. I believe it matches what you described. dir1 dir2

HDUysz commented 5 months ago

Have you followed the README step by step ? You should first set: model_name="speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch" and then run the command: bash demo.sh --stage 1 --model_name ${model_name} --output_dir results --text "nothing was to be done but to put about, and return in disappointment towards the north."

This script will download models from ModelScope or Huggingface into exp/speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch and exp/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch. Are these directories existed ?

Then, I ran the following command: bash demo.sh --stage 2 --model_name ${model_name} --output_dir results --text "nothing was to be done but to put about, and return in disappointment towards the north." --prompt_text "one of these is context" --prompt_audio "demo/8230_279154_000013_000003.wav". It threw the error I initially described. Although the command line indicated that the results were saved in the 'results' folder, I couldn't find this folder in the project. So, I'm guessing something might have gone wrong somewhere.

ZhihaoDU commented 5 months ago

Sorry for the late response. From your Traceback, I find the error occurred when loading the model. I guess it is probable that your model is not downloaded properly. Have you installed the git lfs? We use git lfs to manage the model checkpoint files. You can check your model size, it should be the same as the ModelScope model page.

ZhihaoDU commented 5 months ago

Sorry for the late response. From your Traceback, I find the error occurred when loading the model. I guess it is probable that your model is not downloaded properly. Have you installed the git lfs? We use git lfs to manage the model checkpoint files. You can check your model size, it should be the same as the ModelScope model page.

Model speech_synthesizer-laura-en-libritts-16k-codec_nq2-pytorch should have the size of 387.44MB. Model audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorch should have the size of 265.94MB