Open Y-Bay opened 1 year ago
可以试一下用更大显存的卡吗?我记得2080TI是11G的,我自己测试在3060-12G,3090-24G上都是OK的,而且训练的时候,batch到8都没有占满内存
4090 24G显存也OOM
@xslower 确认是最新版本的模型代码吗?还有就是数据集里的数据都是多长的?batchsize有多大?
@xslower 确认是最新版本的模型代码吗?还有就是数据集里的数据都是多长的?batchsize有多大?
加载模型阶段就凉了。没到跑batch。我直接加载-int4模型会报权重无法计算梯度的错。加载原模型就直接暴显存。最新代码。
D:\Env\Python39\python.exe E:\code\gpt\chatGLM-6B-QLoRA\train_qlora.py
bin D:\Env\Python39\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.
Loading checkpoint shards: 100%|██████████| 7/7 [00:08<00:00, 1.25s/it]
Traceback (most recent call last):
File "E:\code\gpt\chatGLM-6B-QLoRA\train_qlora.py", line 209, in
@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?
@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?
专门看了下,都>=你列的那些包。例bitsandybytes用的是0.40.2。怀疑是bytsandybytes的问题。正常6B模型的32位版,本身就得>24G显存,如果在加载模型过程中不进行量化,本来就得爆。正常推断的时候,要么加载int4版本,要么使用half()版。
我用transformers==4.31.0
就会提示下面这行
You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.
然后模型占用12G,并报上面的错误
把版本降为transformers==4.30.2
就正常
2023-09-11 09:47:27.394959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-18wy1blmurcx8 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c
Compiling gcc -O3 -fPIC -std=c99 /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c -shared -o /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so
Using quantization cache
Applying quantization to glm layers
这个也是报显存错误吗?
微调代码是这样的!
@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?
@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?
你这个仓库代码应该是哪里有错误,一般克隆好仓库,安装好环境,修改好路径,就可以直接运行微调了,但是在colab上还有我自己的台式机上都是会报错
@
微调代码是这样的!
--model_name_or_path /content/chatGLM-6B-QLoRA/chatglm-6b 用chatglm-6b-int4是跑不起来的,加载原始fp模型,加载过程做int4
然后要确保: 1、chatglm模型的资源(包括官方的remote_scripts等资源)都是最新的 2、各个库的版本保持一致
2023-09-11 09:47:27.394959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-18wy1blmurcx8 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get
CUDA error: invalid device function
errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so... No compiled kernel found. Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c Compiling gcc -O3 -fPIC -std=c99 /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c -shared -o /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so Load kernel : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so Using quantization cache Applying quantization to glm layers 这个也是报显存错误吗?
这段里没有任何报错,都是warning……或者你没有复制全,没有复制到error部分
GPU硬件 4张2080Ti,单卡显存12G,指定单卡运行。 执行chatglm-6b微调
CUDA_VISIBLE_DEVICES=0 python train_qlora.py \ --train_args_json chatGLM_6B_QLoRA.json \ --model_name_or_path /data/chatglm-6b \ --train_data_path data/train.jsonl \ --eval_data_path data/eval.jsonl \ --lora_rank 4 \ --lora_dropout 0.05 \ --compute_dtype fp32
正常运行,并在./saved_files目录保存结果。 但执行chatglm2-6b微调(确认chatglm2-6b文件是最新版本)CUDA_VISIBLE_DEVICES=0 python train_qlora.py \ --train_args_json chatGLM_6B_QLoRA.json \ --model_name_or_path /data/chatglm2-6b \ --train_data_path data/train.jsonl \ --eval_data_path data/eval.jsonl \ --lora_rank 4 \ --lora_dropout 0.05 \ --compute_dtype fp32
会报错, `===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please runpython -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/softwares/anaconda3/envs/langchain did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug. The model weights are not tied. Please use the │
│ │
│ 211 │
│ 212 if name == "main": │
│ 213 │ args = parse_args() │
│ ❱ 214 │ train(args) │
│ 215 │
│ 216 │
│ │
│ /data/data/FT_LLM/chatGLM-6B-QLoRA/train_qlora.py:153 in train │
│ │
│ 150 │ # "output_layer": "cpu", │
│ 151 │ # } │
│ 152 │ │
│ ❱ 153 │ model = AutoModel.from_pretrained(global_args.model_name_or_path, │
│ 154 │ │ │ │ │ │ │ │ │ quantization_config=q_config, │
│ 155 │ │ │ │ │ │ │ │ │ device_map='auto', │
│ 156 │ │ │ │ │ │ │ │ │ trust_remote_code=True) │
│ │
│ /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/transformers/models/ │
│ auto/auto_factory.py:488 in from_pretrained │
│ │
│ 485 │ │ │ │ model_class.register_for_auto_class(cls.name) │
│ 486 │ │ │ else: │
│ 487 │ │ │ │ cls.register(config.class, model_class, exist_ok=True) │
│ ❱ 488 │ │ │ return model_class.from_pretrained( │
│ 489 │ │ │ │ pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, │
│ 490 │ │ │ ) │
│ 491 │ │ elif type(config) in cls._model_mapping.keys(): │
│ │
│ /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/transformers/modelin │
│ g_utils.py:2842 in from_pretrained │
│ │
│ 2839 │ │ │ │ │ key: device_map[key] for key in device_map.keys() if key not in modu │
│ 2840 │ │ │ │ } │
│ 2841 │ │ │ │ if "cpu" in device_map_without_lm_head.values() or "disk" in devicemap │
│ ❱ 2842 │ │ │ │ │ raise ValueError( │
│ 2843 │ │ │ │ │ │ """ │
│ 2844 │ │ │ │ │ │ Some modules are dispatched on the CPU or the disk. Make sure yo │
│ 2845 │ │ │ │ │ │ the quantized model. If you want to dispatch the model on the CP │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set
device_map='auto'
tie_weights
method before using theinfer_auto_device
function. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data/data/FT_LLM/chatGLM-6B-QLoRA/train_qlora.py:214 inload_in_8bit_fp32_cpu_offload=True
and pass a customdevice_map
tofrom_pretrained
. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.注释掉train_qlora.py中的
model = AutoModel.from_pretrained(global_args.model_name_or_path, quantization_config=q_config, device_map='auto', trust_remote_code=True)会报OOM错误
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please runpython -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/softwares/anaconda3/envs/langchain did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug. Loading checkpoint shards: 71%|██████████████████████████████████████████████████████████████▊ | 5/7 [00:14<00:05, 2.91s/it] OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 10.75 GiB total capacity; 10.08 GiB already allocated; 142.50 MiB free; 10.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF` 请问chatglm2有对应的修改方案吗?