Support for Llama 3.1 model

sbmandava commented 3 months ago

Are there instructions specific to creating a bmodel from onnx for Llama 3.1 (not lllam3)

Running this is erroring out. python export_onnx.py --model_path ../../../../Meta-Llama-3.1-8B-Instruct/ --seq_length 1024

Convert block & block_cache 0%| | 0/32 [00:00<?, ?it/s]The attention layers in this model are transitioning from computing the RoPE embeddings internally through position_ids (2D tensor with the indexes of the tokens), to using externally computed position_embeddings (Tuple of tensors, containing cos and sin). In v4.45 position_ids will be removed and position_embeddings will be mandatory.

chuxiaoyi2023 commented 3 months ago

we are supporting llama 3.1, please be patient, thanks~

sbmandava commented 3 months ago

LLM-TPU/models/Llama3_1/compile/export_onnx.py does not exist.. (according to documentation it should)

pip install --upgrade transformers to version 4.44.0

Copying the one from Llama3 and running it ..is getting error The attention layers in this model are transitioning from computing the RoPE embeddings internally through position_ids (2D tensor with the indexes of the tokens), to using externally computed position_embeddings (Tuple of tensors, containing cos and sin). In v4.45 position_ids will be removed and position_embeddings will be mandatory. AttributeError: 'tuple' object has no attribute 'update'

In interim can you make the bmodel available python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int4_1dev_seq512.bmodel

Currently it says file not found.

chuxiaoyi2023 commented 4 days ago

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq512.bmodel

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq1024.bmodel

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq2048.bmodel

python3 -m dfss --url=open@sophgo.com:/ext_model_information/LLM/LLM-TPU/llama3.1-8b_int8_1dev_seq4096.bmodel

is available

sophgo / LLM-TPU

Support for Llama 3.1 model #36