Open Nikoliazzz opened 2 months ago
Duplicate of https://github.com/secretflow/spu/issues/704
Thanks,it works!However,I am having another issue right now.Do you know how to solve this one?
@Ye-D mind take a look?
This example is only tested on the LLaMA7b of EasyML, not that of transformers library
您好,请问我应该如何处理这个问题,是库导入不对的原因嘛?
可能是版本不对,试试这个回答里的方法:https://github.com/secretflow/spu/issues/782#issuecomment-2249651029
Hi @seeronline
Due to security concerns, please do not post random download links without an explanation.
我尝试了#782,将LLaMAConfigurator换成了LLaMAConfig,出现了以下错误:python flax_llama7b.py --model_path /home/lenovo/Documents/.vscode/llama_7b2 --config ./3pc.json You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Traceback (most recent call last): File "/home/lenovo/Documents/.vscode/.venv/lib/python3.8/site-packages/transformers/modeling_flax_utils.py", line 830, in from_pretrained state = from_bytes(cls, state_f.read()) File "/home/lenovo/Documents/.vscode/.venv/lib/python3.8/site-packages/flax/serialization.py", line 425, in from_bytes state_dict = msgpack_restore(encoded_bytes) File "/home/lenovo/Documents/.vscode/.venv/lib/python3.8/site-packages/flax/serialization.py", line 407, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/lenovo/Documents/.vscode/.venv/lib/python3.8/site-packages/transformers/modeling_flax_utils.py", line 834, in from_pretrained if f.read().startswith("version"): File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 0: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "flax_llama7b.py", line 58, in
@Ye-D 可以帮忙分析一下嘛,谢谢!
我留意到当我使用checkpoint_dir时会报错:python convert_hf_to_easylm.py \
--checkpoint_dir /home/lenovo/Documents/.vscode/llama_7b \ --output_file /home/lenovo/Documents/.vscode/llama_7b_easylm/flax-llama7b-EasyLM3.msgpack \ --model_size 7b \ --streaming false FATAL Flags parsing error: Unknown command line flag 'checkpoint_dir' Pass --helpshort or --helpfull to see help on flags. 所以我改用了--hf_mode: (.venv) (base) lenovo@lenovo-07:~/Documents/.vscode/spu/examples/python/ml/flax_llama7b/EasyLM/models/llama$ python convert_hf_to_easylm.py \ --hf_model /home/lenovo/Documents/.vscode/llama_7b \ --output_file /home/lenovo/Documents/.vscode/llama_7b_easylm/flax-llama7b-EasyLM3.msgpack \ --llama.base_model llama_7b \ --streaming false Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.33s/it] Start convert weight to easylm format... Convert weight to easylm format finished... Start to save... Save finished!!! take time: 188.28942155838013 save path: /home/lenovo/Documents/.vscode/llama_7b_easylm/flax-llama7b-EasyLM3.msgpack
After using the EasyLM here to convert the hf model to an easylm one( https://github.com/young-geng/EasyLM/tree/08_31_2023),the model could be loaded successfully.However,the program cannot be run to complete. Is it because of the version of jaxlib?
(py311xie) (base) lenovo@lenovo-07:~/Documents/.vscode/spu/examples/python/ml/flax_llama7b$ python flax_llama7b.py --model_path /home/lenovo/Documents/.vscode/llama_7b --config ./3pc.json
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
Some of the weights of FlaxLLaMAForCausalLM were initialized in float16 precision from the model checkpoint at /home/lenovo/Documents/.vscode/llama_7b:
[('lm_head', 'kernel'), ('transformer', 'h', '0', 'attention', 'wk', 'kernel'), ('transformer', 'h', '0', 'attention', 'wo', 'kernel'), ('transformer', 'h', '0', 'attention', 'wq', 'kernel'), ('transformer', 'h', '0', 'attention', 'wv', 'kernel'), ('transformer', 'h', '0', 'attention_norm', 'kernel'), ('transformer', 'h', '0', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '0', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '0', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '0', 'ffn_norm', 'kernel'), ('transformer', 'h', '1', 'attention', 'wk', 'kernel'), ('transformer', 'h', '1', 'attention', 'wo', 'kernel'), ('transformer', 'h', '1', 'attention', 'wq', 'kernel'), ('transformer', 'h', '1', 'attention', 'wv', 'kernel'), ('transformer', 'h', '1', 'attention_norm', 'kernel'), ('transformer', 'h', '1', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '1', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '1', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '1', 'ffn_norm', 'kernel'), ('transformer', 'h', '10', 'attention', 'wk', 'kernel'), ('transformer', 'h', '10', 'attention', 'wo', 'kernel'), ('transformer', 'h', '10', 'attention', 'wq', 'kernel'), ('transformer', 'h', '10', 'attention', 'wv', 'kernel'), ('transformer', 'h', '10', 'attention_norm', 'kernel'), ('transformer', 'h', '10', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '10', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '10', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '10', 'ffn_norm', 'kernel'), ('transformer', 'h', '11', 'attention', 'wk', 'kernel'), ('transformer', 'h', '11', 'attention', 'wo', 'kernel'), ('transformer', 'h', '11', 'attention', 'wq', 'kernel'), ('transformer', 'h', '11', 'attention', 'wv', 'kernel'), ('transformer', 'h', '11', 'attention_norm', 'kernel'), ('transformer', 'h', '11', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '11', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '11', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '11', 'ffn_norm', 'kernel'), ('transformer', 'h', '12', 'attention', 'wk', 'kernel'), ('transformer', 'h', '12', 'attention', 'wo', 'kernel'), ('transformer', 'h', '12', 'attention', 'wq', 'kernel'), ('transformer', 'h', '12', 'attention', 'wv', 'kernel'), ('transformer', 'h', '12', 'attention_norm', 'kernel'), ('transformer', 'h', '12', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '12', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '12', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '12', 'ffn_norm', 'kernel'), ('transformer', 'h', '13', 'attention', 'wk', 'kernel'), ('transformer', 'h', '13', 'attention', 'wo', 'kernel'), ('transformer', 'h', '13', 'attention', 'wq', 'kernel'), ('transformer', 'h', '13', 'attention', 'wv', 'kernel'), ('transformer', 'h', '13', 'attention_norm', 'kernel'), ('transformer', 'h', '13', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '13', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '13', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '13', 'ffn_norm', 'kernel'), ('transformer', 'h', '14', 'attention', 'wk', 'kernel'), ('transformer', 'h', '14', 'attention', 'wo', 'kernel'), ('transformer', 'h', '14', 'attention', 'wq', 'kernel'), ('transformer', 'h', '14', 'attention', 'wv', 'kernel'), ('transformer', 'h', '14', 'attention_norm', 'kernel'), ('transformer', 'h', '14', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '14', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '14', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '14', 'ffn_norm', 'kernel'), ('transformer', 'h', '15', 'attention', 'wk', 'kernel'), ('transformer', 'h', '15', 'attention', 'wo', 'kernel'), ('transformer', 'h', '15', 'attention', 'wq', 'kernel'), ('transformer', 'h', '15', 'attention', 'wv', 'kernel'), ('transformer', 'h', '15', 'attention_norm', 'kernel'), ('transformer', 'h', '15', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '15', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '15', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '15', 'ffn_norm', 'kernel'), ('transformer', 'h', '16', 'attention', 'wk', 'kernel'), ('transformer', 'h', '16', 'attention', 'wo', 'kernel'), ('transformer', 'h', '16', 'attention', 'wq', 'kernel'), ('transformer', 'h', '16', 'attention', 'wv', 'kernel'), ('transformer', 'h', '16', 'attention_norm', 'kernel'), ('transformer', 'h', '16', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '16', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '16', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '16', 'ffn_norm', 'kernel'), ('transformer', 'h', '17', 'attention', 'wk', 'kernel'), ('transformer', 'h', '17', 'attention', 'wo', 'kernel'), ('transformer', 'h', '17', 'attention', 'wq', 'kernel'), ('transformer', 'h', '17', 'attention', 'wv', 'kernel'), ('transformer', 'h', '17', 'attention_norm', 'kernel'), ('transformer', 'h', '17', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '17', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '17', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '17', 'ffn_norm', 'kernel'), ('transformer', 'h', '18', 'attention', 'wk', 'kernel'), ('transformer', 'h', '18', 'attention', 'wo', 'kernel'), ('transformer', 'h', '18', 'attention', 'wq', 'kernel'), ('transformer', 'h', '18', 'attention', 'wv', 'kernel'), ('transformer', 'h', '18', 'attention_norm', 'kernel'), ('transformer', 'h', '18', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '18', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '18', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '18', 'ffn_norm', 'kernel'), ('transformer', 'h', '19', 'attention', 'wk', 'kernel'), ('transformer', 'h', '19', 'attention', 'wo', 'kernel'), ('transformer', 'h', '19', 'attention', 'wq', 'kernel'), ('transformer', 'h', '19', 'attention', 'wv', 'kernel'), ('transformer', 'h', '19', 'attention_norm', 'kernel'), ('transformer', 'h', '19', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '19', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '19', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '19', 'ffn_norm', 'kernel'), ('transformer', 'h', '2', 'attention', 'wk', 'kernel'), ('transformer', 'h', '2', 'attention', 'wo', 'kernel'), ('transformer', 'h', '2', 'attention', 'wq', 'kernel'), ('transformer', 'h', '2', 'attention', 'wv', 'kernel'), ('transformer', 'h', '2', 'attention_norm', 'kernel'), ('transformer', 'h', '2', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '2', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '2', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '2', 'ffn_norm', 'kernel'), ('transformer', 'h', '20', 'attention', 'wk', 'kernel'), ('transformer', 'h', '20', 'attention', 'wo', 'kernel'), ('transformer', 'h', '20', 'attention', 'wq', 'kernel'), ('transformer', 'h', '20', 'attention', 'wv', 'kernel'), ('transformer', 'h', '20', 'attention_norm', 'kernel'), ('transformer', 'h', '20', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '20', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '20', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '20', 'ffn_norm', 'kernel'), ('transformer', 'h', '21', 'attention', 'wk', 'kernel'), ('transformer', 'h', '21', 'attention', 'wo', 'kernel'), ('transformer', 'h', '21', 'attention', 'wq', 'kernel'), ('transformer', 'h', '21', 'attention', 'wv', 'kernel'), ('transformer', 'h', '21', 'attention_norm', 'kernel'), ('transformer', 'h', '21', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '21', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '21', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '21', 'ffn_norm', 'kernel'), ('transformer', 'h', '22', 'attention', 'wk', 'kernel'), ('transformer', 'h', '22', 'attention', 'wo', 'kernel'), ('transformer', 'h', '22', 'attention', 'wq', 'kernel'), ('transformer', 'h', '22', 'attention', 'wv', 'kernel'), ('transformer', 'h', '22', 'attention_norm', 'kernel'), ('transformer', 'h', '22', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '22', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '22', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '22', 'ffn_norm', 'kernel'), ('transformer', 'h', '23', 'attention', 'wk', 'kernel'), ('transformer', 'h', '23', 'attention', 'wo', 'kernel'), ('transformer', 'h', '23', 'attention', 'wq', 'kernel'), ('transformer', 'h', '23', 'attention', 'wv', 'kernel'), ('transformer', 'h', '23', 'attention_norm', 'kernel'), ('transformer', 'h', '23', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '23', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '23', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '23', 'ffn_norm', 'kernel'), ('transformer', 'h', '24', 'attention', 'wk', 'kernel'), ('transformer', 'h', '24', 'attention', 'wo', 'kernel'), ('transformer', 'h', '24', 'attention', 'wq', 'kernel'), ('transformer', 'h', '24', 'attention', 'wv', 'kernel'), ('transformer', 'h', '24', 'attention_norm', 'kernel'), ('transformer', 'h', '24', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '24', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '24', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '24', 'ffn_norm', 'kernel'), ('transformer', 'h', '25', 'attention', 'wk', 'kernel'), ('transformer', 'h', '25', 'attention', 'wo', 'kernel'), ('transformer', 'h', '25', 'attention', 'wq', 'kernel'), ('transformer', 'h', '25', 'attention', 'wv', 'kernel'), ('transformer', 'h', '25', 'attention_norm', 'kernel'), ('transformer', 'h', '25', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '25', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '25', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '25', 'ffn_norm', 'kernel'), ('transformer', 'h', '26', 'attention', 'wk', 'kernel'), ('transformer', 'h', '26', 'attention', 'wo', 'kernel'), ('transformer', 'h', '26', 'attention', 'wq', 'kernel'), ('transformer', 'h', '26', 'attention', 'wv', 'kernel'), ('transformer', 'h', '26', 'attention_norm', 'kernel'), ('transformer', 'h', '26', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '26', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '26', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '26', 'ffn_norm', 'kernel'), ('transformer', 'h', '27', 'attention', 'wk', 'kernel'), ('transformer', 'h', '27', 'attention', 'wo', 'kernel'), ('transformer', 'h', '27', 'attention', 'wq', 'kernel'), ('transformer', 'h', '27', 'attention', 'wv', 'kernel'), ('transformer', 'h', '27', 'attention_norm', 'kernel'), ('transformer', 'h', '27', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '27', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '27', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '27', 'ffn_norm', 'kernel'), ('transformer', 'h', '28', 'attention', 'wk', 'kernel'), ('transformer', 'h', '28', 'attention', 'wo', 'kernel'), ('transformer', 'h', '28', 'attention', 'wq', 'kernel'), ('transformer', 'h', '28', 'attention', 'wv', 'kernel'), ('transformer', 'h', '28', 'attention_norm', 'kernel'), ('transformer', 'h', '28', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '28', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '28', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '28', 'ffn_norm', 'kernel'), ('transformer', 'h', '29', 'attention', 'wk', 'kernel'), ('transformer', 'h', '29', 'attention', 'wo', 'kernel'), ('transformer', 'h', '29', 'attention', 'wq', 'kernel'), ('transformer', 'h', '29', 'attention', 'wv', 'kernel'), ('transformer', 'h', '29', 'attention_norm', 'kernel'), ('transformer', 'h', '29', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '29', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '29', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '29', 'ffn_norm', 'kernel'), ('transformer', 'h', '3', 'attention', 'wk', 'kernel'), ('transformer', 'h', '3', 'attention', 'wo', 'kernel'), ('transformer', 'h', '3', 'attention', 'wq', 'kernel'), ('transformer', 'h', '3', 'attention', 'wv', 'kernel'), ('transformer', 'h', '3', 'attention_norm', 'kernel'), ('transformer', 'h', '3', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '3', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '3', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '3', 'ffn_norm', 'kernel'), ('transformer', 'h', '30', 'attention', 'wk', 'kernel'), ('transformer', 'h', '30', 'attention', 'wo', 'kernel'), ('transformer', 'h', '30', 'attention', 'wq', 'kernel'), ('transformer', 'h', '30', 'attention', 'wv', 'kernel'), ('transformer', 'h', '30', 'attention_norm', 'kernel'), ('transformer', 'h', '30', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '30', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '30', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '30', 'ffn_norm', 'kernel'), ('transformer', 'h', '31', 'attention', 'wk', 'kernel'), ('transformer', 'h', '31', 'attention', 'wo', 'kernel'), ('transformer', 'h', '31', 'attention', 'wq', 'kernel'), ('transformer', 'h', '31', 'attention', 'wv', 'kernel'), ('transformer', 'h', '31', 'attention_norm', 'kernel'), ('transformer', 'h', '31', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '31', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '31', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '31', 'ffn_norm', 'kernel'), ('transformer', 'h', '4', 'attention', 'wk', 'kernel'), ('transformer', 'h', '4', 'attention', 'wo', 'kernel'), ('transformer', 'h', '4', 'attention', 'wq', 'kernel'), ('transformer', 'h', '4', 'attention', 'wv', 'kernel'), ('transformer', 'h', '4', 'attention_norm', 'kernel'), ('transformer', 'h', '4', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '4', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '4', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '4', 'ffn_norm', 'kernel'), ('transformer', 'h', '5', 'attention', 'wk', 'kernel'), ('transformer', 'h', '5', 'attention', 'wo', 'kernel'), ('transformer', 'h', '5', 'attention', 'wq', 'kernel'), ('transformer', 'h', '5', 'attention', 'wv', 'kernel'), ('transformer', 'h', '5', 'attention_norm', 'kernel'), ('transformer', 'h', '5', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '5', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '5', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '5', 'ffn_norm', 'kernel'), ('transformer', 'h', '6', 'attention', 'wk', 'kernel'), ('transformer', 'h', '6', 'attention', 'wo', 'kernel'), ('transformer', 'h', '6', 'attention', 'wq', 'kernel'), ('transformer', 'h', '6', 'attention', 'wv', 'kernel'), ('transformer', 'h', '6', 'attention_norm', 'kernel'), ('transformer', 'h', '6', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '6', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '6', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '6', 'ffn_norm', 'kernel'), ('transformer', 'h', '7', 'attention', 'wk', 'kernel'), ('transformer', 'h', '7', 'attention', 'wo', 'kernel'), ('transformer', 'h', '7', 'attention', 'wq', 'kernel'), ('transformer', 'h', '7', 'attention', 'wv', 'kernel'), ('transformer', 'h', '7', 'attention_norm', 'kernel'), ('transformer', 'h', '7', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '7', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '7', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '7', 'ffn_norm', 'kernel'), ('transformer', 'h', '8', 'attention', 'wk', 'kernel'), ('transformer', 'h', '8', 'attention', 'wo', 'kernel'), ('transformer', 'h', '8', 'attention', 'wq', 'kernel'), ('transformer', 'h', '8', 'attention', 'wv', 'kernel'), ('transformer', 'h', '8', 'attention_norm', 'kernel'), ('transformer', 'h', '8', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '8', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '8', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '8', 'ffn_norm', 'kernel'), ('transformer', 'h', '9', 'attention', 'wk', 'kernel'), ('transformer', 'h', '9', 'attention', 'wo', 'kernel'), ('transformer', 'h', '9', 'attention', 'wq', 'kernel'), ('transformer', 'h', '9', 'attention', 'wv', 'kernel'), ('transformer', 'h', '9', 'attention_norm', 'kernel'), ('transformer', 'h', '9', 'feed_forward', 'w1', 'kernel'), ('transformer', 'h', '9', 'feed_forward', 'w2', 'kernel'), ('transformer', 'h', '9', 'feed_forward', 'w3', 'kernel'), ('transformer', 'h', '9', 'ffn_norm', 'kernel'), ('transformer', 'ln_f', 'kernel'), ('transformer', 'wte', 'embedding')]
You should probably UPCAST the model weights to float32 if this was not intended. See [~FlaxPreTrainedModel.to_fp32
] for further information on how to do this.
Run on CPU Q: What is the largest animal? A: The
Run on SPU
Here is the logs when running nodectl.py:
(py311xie) (base) lenovo@lenovo-07:~/Documents/.vscode/spu/examples/python/utils$ python nodectl.py --config ../ml/flax_llama7b/3pc.json up
[2024-08-27 11:33:28,012] [ForkServerProcess-1] Starting grpc server at 127.0.0.1:61920
[2024-08-27 11:33:28,016] [ForkServerProcess-4] Starting grpc server at 127.0.0.1:61923
[2024-08-27 11:33:28,092] [ForkServerProcess-3] Starting grpc server at 127.0.0.1:61922
[2024-08-27 11:33:28,095] [ForkServerProcess-2] Starting grpc server at 127.0.0.1:61921
[2024-08-27 11:33:28,105] [ForkServerProcess-5] Starting grpc server at 127.0.0.1:61924
[2024-08-27 11:34:07,711] [ForkServerProcess-1] Run : builtin_spu_init at node:0
[2024-08-27 11:34:07,711] [ForkServerProcess-2] Run : builtin_spu_init at node:1
[2024-08-27 11:34:07,712] [ForkServerProcess-3] Run : builtin_spu_init at node:2
I0827 11:34:07.728576 987098 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1181] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=61930.
W0827 11:34:07.728617 987098 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1187] Builtin services are disabled according to ServerOptions.has_builtin_services
I0827 11:34:07.729441 987098 0 external/com_github_brpc_brpc/src/butil/iobuf_profiler.cpp:67] g_iobuf_profiler_sample_rate=100
I0827 11:34:07.730816 987102 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1181] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=61932.
W0827 11:34:07.730864 987102 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1187] Builtin services are disabled according to ServerOptions.has_builtin_services
I0827 11:34:07.730928 987099 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1181] Server[yacl::link::transport::internal::ReceiverServiceImpl] is serving on port=61931.
W0827 11:34:07.730967 987099 0 external/com_github_brpc_brpc/src/brpc/server.cpp:1187] Builtin services are disabled according to ServerOptions.has_builtin_services
I0827 11:34:07.731202 987114 4294967297 external/com_github_brpc_brpc/src/butil/iobuf_profiler.cpp:67] g_iobuf_profiler_sample_rate=100
I0827 11:34:07.731810 987102 0 external/com_github_brpc_brpc/src/butil/iobuf_profiler.cpp:67] g_iobuf_profiler_sample_rate=100
[2024-08-27 11:34:07,738] [ForkServerProcess-1] spu-runtime (SPU) initialized
[2024-08-27 11:34:07,738] [ForkServerProcess-3] spu-runtime (SPU) initialized
[2024-08-27 11:34:07,738] [ForkServerProcess-2] spu-runtime (SPU) initialized
An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
[2024-08-27 11:39:02,654] [ForkServerProcess-5] Run :
Issue Type
Build/Install
Modules Involved
SPU runtime, SPU compiler
Have you reproduced the bug with SPU HEAD?
Yes
Have you searched existing issues?
Yes
SPU Version
0.9.2b0
OS Platform and Distribution
Linux
Python Version
3.11
Compiler Version
gcc
Current Behavior?
I was trying to reproduce "Flax Llama-7B Example with Puma" in "examples/python/ml/flax_llama7b".However,I failed to load flax-llama7b-EasyLM model. ![Uploading 微信图片_20240826100747.png…]()
Standalone code to reproduce the issue
Relevant log output
No response