===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ubuntu/miniconda3/envs/LLaMA did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 112
/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Traceback (most recent call last):
File "/data/Cornucopia-LLaMA-Fin-Chinese/infer.py", line 143, in
main()
File "/data/Cornucopia-LLaMA-Fin-Chinese/infer.py", line 46, in main
tokenizer = LlamaTokenizer.from_pretrained(args.base_model)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1846, in from_pretrained
return cls._from_pretrained(
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2009, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 128, in init
self.sp_model.Load(vocab_file)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
(
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/ubuntu/miniconda3/envs/LLaMA did not contain libcudart.so as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.0 CUDA SETUP: Detected CUDA version 112 /home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so... The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565 Traceback (most recent call last): File "/data/Cornucopia-LLaMA-Fin-Chinese/infer.py", line 143, in
main()
File "/data/Cornucopia-LLaMA-Fin-Chinese/infer.py", line 46, in main
tokenizer = LlamaTokenizer.from_pretrained(args.base_model)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1846, in from_pretrained
return cls._from_pretrained(
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2009, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 128, in init
self.sp_model.Load(vocab_file)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/ubuntu/miniconda3/envs/LLaMA/lib/python3.10/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
(