sophgo / LLM-TPU

Run generative AI models in sophgo BM1684X
99 stars 16 forks source link

llma3 is not available after conversion #13

Closed Bao0ne closed 1 month ago

Bao0ne commented 4 months ago

log:

root@bm1684:/data/LLM-TPU/models/Llama3/python_demo# python3 pipeline.py -m /data/models/llama3-8b_int8_1dev_512.bmodel -t ../token_config/ --devid 0
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Load ../token_config/ ...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Device [ 0 ] loading ....
[BMRT][bmcpu_setup:436] INFO:cpu_lib 'libcpuop.so' is loaded.
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init 
[BMRT][BMProfile:60] INFO:Profile For arch=3
[BMRT][BMProfileDeviceBase:190] INFO:gdma=0, tiu=0, mcu=0
Model[/data/models/llama3-8b_int8_1dev_512.bmodel] loading ....
[BMRT][load_bmodel:1696] INFO:Loading bmodel from [/data/models/llama3-8b_int8_1dev_512.bmodel]. Thanks for your patience...
[BMRT][load_bmodel:1583] INFO:Bmodel loaded, version 2.2+v1.6.beta.0-243-ga948d0acb-20240507
[BMRT][load_bmodel:1585] INFO:pre net num: 0, load net num: 69
[BMRT][load_tpu_module:1674] INFO:loading firmare in bmodel
[BMRT][preload_funcs:1876] INFO: core_id=0, multi_fullnet_func_id=27
[BMRT][preload_funcs:1879] INFO: core_id=0, dynamic_fullnet_func_id=28
Done!

=================================================================
1. If you want to quit, please enter one of [q, quit, exit]
2. To create a new chat session, please enter one of [clear, new]
=================================================================

Question: 你好 

Answer: You are Llama3, a helpful AI assistant.博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博士博
zifeng-radxa commented 3 months ago

it seems TPU is hang, could you check your libbmrt.so path is correct?

Bao0ne commented 3 months ago

@zifeng-radxa I can run the Llama3 int4 provided in the documentation but it gives the above issue when trying to run the in8 model converted as per the documentation.

I found these files:

root@bm1684:/home/linaro# find / -name "libbmrt.so"
/opt/sophon/libsophon-0.4.9/lib/libbmrt.so
/data/LLM-TPU/models/ChatGLM3/support/lib_soc/libbmrt.so
/data/LLM-TPU/models/ChatGLM3/support/lib_pcie/libbmrt.so
/data/LLM-TPU/models/WizardCoder/demo/lib_soc/lib/libbmrt.so
/data/LLM-TPU/models/WizardCoder/demo/lib_pcie/lib/libbmrt.so
/data/LLM-TPU/models/ChatGLM2/support/lib_soc/libbmrt.so
/data/LLM-TPU/models/ChatGLM2/support/lib_pcie/libbmrt.so
/data/LLM-TPU/models/LWM/support/lib_soc/libbmrt.so
/data/LLM-TPU/models/LWM/support/lib_pcie/libbmrt.so
/data/LLM-TPU/models/Baichuan2/src/lib_soc/libbmrt.so
/data/LLM-TPU/models/Baichuan2/src/lib_pcie/libbmrt.so
/data/LLM-TPU/support/lib_soc/libbmrt.so
/data/LLM-TPU/support/lib_pcie/libbmrt.so
zifeng-radxa commented 3 months ago

i think you need to use /data/LLM-TPU/support/lib_soc/libbmrt.so or /data/LLM-TPU/support/lib_pcie/libbmrt.so depend what kind of device you use