Open SoftologyPro opened 1 week ago
Same question actually. The instructions could be a bit more clear on this important step.
Some information here: https://github.com/ictnlp/LLaMA-Omni/issues/10#issuecomment-2354396927
I would recommend to the LLaMA-Omni developer to create a small getmodels.py file at the top level containing:
# getmodels.py
# get Llama-3.1-8B-Omni model (warning: requires ~ 18 G or so)
model_name = 'ICTNLP/Llama-3.1-8B-Omni'
from omni_speech.model.language_model.omni_speech_llama import OmniSpeechLlamaForCausalLM
model = OmniSpeechLlamaForCausalLM.from_pretrained(model_name)
# get whisper large-v3 model (warning: requires 2.88 G)
import whisper
model = whisper.load_model("large-v3", download_root="models/speech_encoder/")
and instruct users to run it
python getmodels.py
Update: There is already a pull request to do this: https://github.com/ictnlp/LLaMA-Omni/pull/12
Downloading Llama that way is much more than 5 GB and leads to another error
model.safetensors.index.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 69.1k/69.1k [00:00<00:00, 291kB/s]
model-00001-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.98G/4.98G [08:09<00:00, 10.2MB/s]
model-00002-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5.00G/5.00G [08:20<00:00, 9.99MB/s]
model-00003-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.92G/4.92G [08:19<00:00, 9.84MB/s]
model-00004-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3.34G/3.34G [05:24<00:00, 10.3MB/s]
Downloading shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [30:17<00:00, 454.45s/it]
Traceback (most recent call last):
File "D:\Tests\LLaMA-Omni\get_models.py", line 4, in <module>
model = OmniSpeechLlamaForCausalLM.from_pretrained(model_name)
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\venv\lib\site-packages\transformers\modeling_utils.py", line 3798, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\language_model\omni_speech_llama.py", line 46, in __init__
self.model = OmniSpeechLlamaModel(config)
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\language_model\omni_speech_llama.py", line 38, in __init__
super(OmniSpeechLlamaModel, self).__init__(config)
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\omni_speech_arch.py", line 32, in __init__
self.speech_encoder = build_speech_encoder(config)
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\speech_encoder\builder.py", line 7, in build_speech_encoder
return WhisperWrappedEncoder.load(config)
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\speech_encoder\speech_encoder.py", line 26, in load
encoder = whisper.load_model(name=model_config.speech_encoder, device='cpu').encoder
File "D:\Tests\LLaMA-Omni\LLaMA-Omni\venv\lib\site-packages\whisper\__init__.py", line 139, in load_model
raise RuntimeError(
RuntimeError: Model models/speech_encoder/large-v3.pt not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']
And when the Gradio UI is started the top dropdown is still empty.
Hopefully we can get an answer to where the https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
should be cloned to (or even if it needs to be cloned and maybe just a subset of files is needed).
Can a dev please respond showing which filenames need to be inside which folders for the gradio UI to find them and populate the dropdown at the top of the UI?
I found a simple solution to place the model
# in the base directory of the repo
mkdir -p models/speech_encoder
cd models/speech_encoder
wget https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
I found a simple solution to place the model
# in the base directory of the repo mkdir -p models/speech_encoder cd models/speech_encoder wget https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
That makes no difference here. If the UI works for you (ie the top dropdown has something in it, and when you click a wav then click Send it works) then please show all the models you download and to where to get this working.
The main issue is the instructions "Download the Llama-3.1-8B-Omni model from π€Huggingface." which needs to be clarified on how to download to where.
When I start the gradio UI the top dropdown is empty and when I click it I get the error
Your instructions say "Download the Llama-3.1-8B-Omni model from π€Huggingface." so I
git clone https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
under the models directory. Is this correct?Should the models locations be as follows?
Any idea why the dropdown is empty? Thanks for any help.