bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda117.so
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda117.so...
You are loading your model in 8bit or 4bit but no linear modules were found in your model. this can happen for some architectures such as gpt2 that uses Conv1D instead of Linear layers. Please double check your model architecture, or submit an issue on github if you think this is a bug.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.05s/it]
trainable params: 974,848 || all params: 6,244,558,848 || trainable%: 0.01561115883009451
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-d5629da83678d2e9/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.56it/s]
Traceback (most recent call last):
File "train_qlora.py", line 203, in
train(args)
File "train_qlora.py", line 181, in train
train_dataset = get_datset(global_args.train_data_path, tokenizer, global_args)
File "train_qlora.py", line 82, in get_datset
dataset = data['train'].map(lambda example: tokenize_func(example, tokenizer, global_args),
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 578, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 543, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3073, in map
for rank, done, content in Dataset._map_single(dataset_kwargs):
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3427, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3330, in apply_function_on_filtered_inputs
processed_inputs = function(fn_args, additional_args, **fn_kwargs)
File "train_qlora.py", line 82, in
dataset = data['train'].map(lambda example: tokenize_func(example, tokenizer, global_args),
File "train_qlora.py", line 73, in tokenize_func
question_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list
我修改模型为chatGLM2-6B后出现这样的错误是什么原因?
python3 train_qlora.py --train_args_json chatGLM_6B_QLoRA.json --model_name_or_path THUDM/chatglm2-6b --train_data_path data/train.jsonl --eval_data_path data/dev.jsonl --lora_rank 4 --lora_dropout 0.05 --compute_dtype fp32
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda117.so /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get
train(args)
File "train_qlora.py", line 181, in train
train_dataset = get_datset(global_args.train_data_path, tokenizer, global_args)
File "train_qlora.py", line 82, in get_datset
dataset = data['train'].map(lambda example: tokenize_func(example, tokenizer, global_args),
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 578, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 543, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3073, in map
for rank, done, content in Dataset._map_single(dataset_kwargs):
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3427, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 3330, in apply_function_on_filtered_inputs
processed_inputs = function(fn_args, additional_args, **fn_kwargs)
File "train_qlora.py", line 82, in
dataset = data['train'].map(lambda example: tokenize_func(example, tokenizer, global_args),
File "train_qlora.py", line 73, in tokenize_func
question_length = input_ids.index(tokenizer.bos_token_id)
ValueError: None is not in list
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.9 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda117.so... You are loading your model in 8bit or 4bit but no linear modules were found in your model. this can happen for some architectures such as gpt2 that uses Conv1D instead of Linear layers. Please double check your model architecture, or submit an issue on github if you think this is a bug. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.05s/it] trainable params: 974,848 || all params: 6,244,558,848 || trainable%: 0.01561115883009451 Found cached dataset json (/root/.cache/huggingface/datasets/json/default-d5629da83678d2e9/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4) 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16.56it/s] Traceback (most recent call last): File "train_qlora.py", line 203, in