Open rishabh004-ai opened 7 months ago
I've tried to download the checkpoint and load it directly using
torch.load(model_path, map_location="cpu")
And it can be loaded correctly.
Could you try this to check whether your downloaded .pt is correct?
Hi, Thanks for the prompt response. I checked, and the model download link is not working. I also checked the link with wget, but it is not reachable. Can you please help me by providing the alternate link?
Hi, the link has been updated. Do you use this new link, or the old version? The new link works for me to download.
Hi, Thanks for responding. I checked with the new link, but I am still getting errors in the download link. The link is not working for me. The error is ?xml version="1.0" encoding="utf-8"?><Error><Code>PublicAccessNotPermitted</Code><Message>Public access is not permitted on this storage account. RequestId:45b0ef67-801e-0083-46f9-97df30000000
We have uploaded the checkpoint to huggingface. You could download it from https://huggingface.co/v-sjhu/WavLLM Thanks
@rishabh004-ai Were you able to inference successfully?
@XiaoshanHsj Do you consider releasing the inference framework under transformers instead of fairseq?
We have uploaded the checkpoint to huggingface. You could download it from https://huggingface.co/v-sjhu/WavLLM Thanks
The hugging face link seems not work, can you help upload it, thanks! Also, may i ask what's the $model_path and $data_name you are utilizing, @XiaoshanHsj ?
@YepJin
I made it work.
bash examples/wavllm/scripts/inference_sft.sh you_path_to/final.pt asr
Also, have to change the content in asr.csv. Otherwie, will lead to not found error.
id audio n_frames prompt tgt_text with_speech
0 examples/wavllm/test_data/audio/asr.flac 166960 Based on the attached audio, generate a comprehensive text transcription of the spoken content. he hoped there would be stew for dinner turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour fattened sauce True
@BinWang28 Thanks for your reply. Recently, we have no plans to move the code base under "transformers". In the next version, e.g. based on the LLAMA-3, we may try to use "transformers" to conduct our model.
I have installed all the libraries and whenever I am running
bash examples/wavllm/scripts/inference_sft.sh $model_path $data_name
. The code is throwing the error as _pickle.UnpicklingError: invalid load key, '\xef'. The error is originating from the line models, saved_cfg = checkpoint_utils.load_model_ensemble() in 454 of SpeechT5/WavLLM/fairseq/examples/wavllm/inference/generate.pyFile "/workspace/SpeechT5/WavLLM/fairseq/examples/wavllm/inference/generate.py", line 454, in <module> cli_main() File "/workspace/SpeechT5/WavLLM/fairseq/examples/wavllm/inference/generate.py", line 450, in cli_main main(args) File "/workspace/SpeechT5/WavLLM/fairseq/examples/wavllm/inference/generate.py", line 50, in main return _main(cfg, h) File "/workspace/SpeechT5/WavLLM/fairseq/examples/wavllm/inference/generate.py", line 122, in _main models, saved_cfg = checkpoint_utils.load_model_ensemble( File "/workspace/SpeechT5/WavLLM/fairseq/fairseq/checkpoint_utils.py", line 363, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task( File "/workspace/SpeechT5/WavLLM/fairseq/fairseq/checkpoint_utils.py", line 421, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/workspace/SpeechT5/WavLLM/fairseq/fairseq/checkpoint_utils.py", line 315, in load_checkpoint_to_cpu state = torch.load(f, map_location=torch.device("cpu")) File "/root/miniconda3/envs/wavllm/lib/python3.10/site-packages/torch/serialization.py", line 1040, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/root/miniconda3/envs/wavllm/lib/python3.10/site-packages/torch/serialization.py", line 1258, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '\xef'.