ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
https://arxiv.org/abs/2409.06666
Apache License 2.0
2.08k stars 118 forks source link

Where do the models go? #18

Open SoftologyPro opened 1 week ago

SoftologyPro commented 1 week ago

When I start the gradio UI the top dropdown is empty and when I click it I get the error

2024-09-19 18:09:56 | INFO | gradio_web_server | Models: []
2024-09-19 18:09:56 | ERROR | stderr | D:\Tests\LLaMA-Omni\LLaMA-Omni\venv\lib\site-packages\gradio\components\dropdown.py:188: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include:  or set allow_custom_value=True.

Your instructions say "Download the Llama-3.1-8B-Omni model from πŸ€—Huggingface." so I git clone https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni under the models directory. Is this correct?

Should the models locations be as follows?

LLaMA-Omni\models\Llama-3.1-8B-Omni\
LLaMA-Omni\models\speech_encoder\
LLaMA-Omni\vocoder\

Any idea why the dropdown is empty? Thanks for any help.

manu-sapiens commented 1 week ago

Same question actually. The instructions could be a bit more clear on this important step.

manu-sapiens commented 1 week ago

Some information here: https://github.com/ictnlp/LLaMA-Omni/issues/10#issuecomment-2354396927

manu-sapiens commented 1 week ago

I would recommend to the LLaMA-Omni developer to create a small getmodels.py file at the top level containing:

# getmodels.py

# get Llama-3.1-8B-Omni model (warning: requires ~ 18 G or so)
model_name = 'ICTNLP/Llama-3.1-8B-Omni'
from omni_speech.model.language_model.omni_speech_llama import OmniSpeechLlamaForCausalLM
model = OmniSpeechLlamaForCausalLM.from_pretrained(model_name)

# get whisper large-v3 model (warning: requires 2.88 G)
import whisper
model = whisper.load_model("large-v3", download_root="models/speech_encoder/")

and instruct users to run it python getmodels.py

Update: There is already a pull request to do this: https://github.com/ictnlp/LLaMA-Omni/pull/12

SoftologyPro commented 1 week ago

Downloading Llama that way is much more than 5 GB and leads to another error

model.safetensors.index.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 69.1k/69.1k [00:00<00:00, 291kB/s]
model-00001-of-00004.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.98G/4.98G [08:09<00:00, 10.2MB/s]
model-00002-of-00004.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.00G/5.00G [08:20<00:00, 9.99MB/s]
model-00003-of-00004.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.92G/4.92G [08:19<00:00, 9.84MB/s]
model-00004-of-00004.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.34G/3.34G [05:24<00:00, 10.3MB/s]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [30:17<00:00, 454.45s/it]
Traceback (most recent call last):
  File "D:\Tests\LLaMA-Omni\get_models.py", line 4, in <module>
    model = OmniSpeechLlamaForCausalLM.from_pretrained(model_name)
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\venv\lib\site-packages\transformers\modeling_utils.py", line 3798, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\language_model\omni_speech_llama.py", line 46, in __init__
    self.model = OmniSpeechLlamaModel(config)
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\language_model\omni_speech_llama.py", line 38, in __init__
    super(OmniSpeechLlamaModel, self).__init__(config)
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\omni_speech_arch.py", line 32, in __init__
    self.speech_encoder = build_speech_encoder(config)
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\speech_encoder\builder.py", line 7, in build_speech_encoder
    return WhisperWrappedEncoder.load(config)
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\omni_speech\model\speech_encoder\speech_encoder.py", line 26, in load
    encoder = whisper.load_model(name=model_config.speech_encoder, device='cpu').encoder
  File "D:\Tests\LLaMA-Omni\LLaMA-Omni\venv\lib\site-packages\whisper\__init__.py", line 139, in load_model
    raise RuntimeError(
RuntimeError: Model models/speech_encoder/large-v3.pt not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']

And when the Gradio UI is started the top dropdown is still empty.

Hopefully we can get an answer to where the https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni should be cloned to (or even if it needs to be cloned and maybe just a subset of files is needed).

SoftologyPro commented 1 week ago

Can a dev please respond showing which filenames need to be inside which folders for the gradio UI to find them and populate the dropdown at the top of the UI?

Btlmd commented 4 days ago

I found a simple solution to place the model

# in the base directory of the repo
mkdir -p models/speech_encoder
cd models/speech_encoder
wget https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
SoftologyPro commented 4 days ago

I found a simple solution to place the model

# in the base directory of the repo
mkdir -p models/speech_encoder
cd models/speech_encoder
wget https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

That makes no difference here. If the UI works for you (ie the top dropdown has something in it, and when you click a wav then click Send it works) then please show all the models you download and to where to get this working.

The main issue is the instructions "Download the Llama-3.1-8B-Omni model from πŸ€—Huggingface." which needs to be clarified on how to download to where.