shuxueslpi / chatGLM-6B-QLoRA

使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
354 stars 46 forks source link

ChatGLM可进行QLoRA微调,但ChatGLM2会报显存OOM #29

Open Y-Bay opened 1 year ago

Y-Bay commented 1 year ago

GPU硬件 4张2080Ti,单卡显存12G,指定单卡运行。 执行chatglm-6b微调 CUDA_VISIBLE_DEVICES=0 python train_qlora.py \ --train_args_json chatGLM_6B_QLoRA.json \ --model_name_or_path /data/chatglm-6b \ --train_data_path data/train.jsonl \ --eval_data_path data/eval.jsonl \ --lora_rank 4 \ --lora_dropout 0.05 \ --compute_dtype fp32 正常运行,并在./saved_files目录保存结果。 但执行chatglm2-6b微调(确认chatglm2-6b文件是最新版本) CUDA_VISIBLE_DEVICES=0 python train_qlora.py \ --train_args_json chatGLM_6B_QLoRA.json \ --model_name_or_path /data/chatglm2-6b \ --train_data_path data/train.jsonl \ --eval_data_path data/eval.jsonl \ --lora_rank 4 \ --lora_dropout 0.05 \ --compute_dtype fp32 会报错, `===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/softwares/anaconda3/envs/langchain did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug. The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function. ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data/data/FT_LLM/chatGLM-6B-QLoRA/train_qlora.py:214 in │ │ │ │ 211 │ │ 212 if name == "main": │ │ 213 │ args = parse_args() │ │ ❱ 214 │ train(args) │ │ 215 │ │ 216 │ │ │ │ /data/data/FT_LLM/chatGLM-6B-QLoRA/train_qlora.py:153 in train │ │ │ │ 150 │ # "output_layer": "cpu", │ │ 151 │ # } │ │ 152 │ │ │ ❱ 153 │ model = AutoModel.from_pretrained(global_args.model_name_or_path, │ │ 154 │ │ │ │ │ │ │ │ │ quantization_config=q_config, │ │ 155 │ │ │ │ │ │ │ │ │ device_map='auto', │ │ 156 │ │ │ │ │ │ │ │ │ trust_remote_code=True) │ │ │ │ /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/transformers/models/ │ │ auto/auto_factory.py:488 in from_pretrained │ │ │ │ 485 │ │ │ │ model_class.register_for_auto_class(cls.name) │ │ 486 │ │ │ else: │ │ 487 │ │ │ │ cls.register(config.class, model_class, exist_ok=True) │ │ ❱ 488 │ │ │ return model_class.from_pretrained( │ │ 489 │ │ │ │ pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, │ │ 490 │ │ │ ) │ │ 491 │ │ elif type(config) in cls._model_mapping.keys(): │ │ │ │ /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/transformers/modelin │ │ g_utils.py:2842 in from_pretrained │ │ │ │ 2839 │ │ │ │ │ key: device_map[key] for key in device_map.keys() if key not in modu │ │ 2840 │ │ │ │ } │ │ 2841 │ │ │ │ if "cpu" in device_map_without_lm_head.values() or "disk" in devicemap │ │ ❱ 2842 │ │ │ │ │ raise ValueError( │ │ 2843 │ │ │ │ │ │ """ │ │ 2844 │ │ │ │ │ │ Some modules are dispatched on the CPU or the disk. Make sure yo │ │ 2845 │ │ │ │ │ │ the quantized model. If you want to dispatch the model on the CP │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 注释掉train_qlora.py中的 model = AutoModel.from_pretrained(global_args.model_name_or_path, quantization_config=q_config, device_map='auto', trust_remote_code=True) device_map='auto' 会报OOM错误 ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/softwares/anaconda3/envs/langchain did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/softwares/anaconda3/envs/langchain/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug. Loading checkpoint shards: 71%|██████████████████████████████████████████████████████████████▊ | 5/7 [00:14<00:05, 2.91s/it] OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 10.75 GiB total capacity; 10.08 GiB already allocated; 142.50 MiB free; 10.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF` 请问chatglm2有对应的修改方案吗?

shuxueslpi commented 1 year ago

可以试一下用更大显存的卡吗?我记得2080TI是11G的,我自己测试在3060-12G,3090-24G上都是OK的,而且训练的时候,batch到8都没有占满内存

xslower commented 1 year ago

4090 24G显存也OOM

shuxueslpi commented 1 year ago

@xslower 确认是最新版本的模型代码吗?还有就是数据集里的数据都是多长的?batchsize有多大?

xslower commented 1 year ago

@xslower 确认是最新版本的模型代码吗?还有就是数据集里的数据都是多长的?batchsize有多大?

加载模型阶段就凉了。没到跑batch。我直接加载-int4模型会报权重无法计算梯度的错。加载原模型就直接暴显存。最新代码。

D:\Env\Python39\python.exe E:\code\gpt\chatGLM-6B-QLoRA\train_qlora.py bin D:\Env\Python39\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug. Loading checkpoint shards: 100%|██████████| 7/7 [00:08<00:00, 1.25s/it] Traceback (most recent call last): File "E:\code\gpt\chatGLM-6B-QLoRA\train_qlora.py", line 209, in train(args) File "E:\code\gpt\chatGLM-6B-QLoRA\train_qlora.py", line 153, in train model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True) File "D:\Env\Python39\lib\site-packages\peft\utils\other.py", line 81, in prepare_model_for_kbit_training param.data = param.data.to(torch.float32) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 428.00 MiB (GPU 0; 23.99 GiB total capacity; 21.69 GiB already allocated; 0 bytes free; 22.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

shuxueslpi commented 1 year ago

@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?

xslower commented 1 year ago

@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?

专门看了下,都>=你列的那些包。例bitsandybytes用的是0.40.2。怀疑是bytsandybytes的问题。正常6B模型的32位版,本身就得>24G显存,如果在加载模型过程中不进行量化,本来就得爆。正常推断的时候,要么加载int4版本,要么使用half()版。

gdhy9064 commented 1 year ago

我用transformers==4.31.0就会提示下面这行 You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug. 然后模型占用12G,并报上面的错误 把版本降为transformers==4.30.2就正常

Hzzhang-nlp commented 1 year ago

2023-09-11 09:47:27.394959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-18wy1blmurcx8 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so... No compiled kernel found. Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c Compiling gcc -O3 -fPIC -std=c99 /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c -shared -o /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so Load kernel : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so Using quantization cache Applying quantization to glm layers 这个也是报显存错误吗?

Hzzhang-nlp commented 1 year ago

微调代码是这样的! image

Hzzhang-nlp commented 1 year ago

@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?

@xslower 不直接加载int4的,就是加载fp的模型,加载过程中做int4,相关依赖库的版本和readme里一致吗?

你这个仓库代码应该是哪里有错误,一般克隆好仓库,安装好环境,修改好路径,就可以直接运行微调了,但是在colab上还有我自己的台式机上都是会报错

shuxueslpi commented 1 year ago

@

微调代码是这样的! image

--model_name_or_path /content/chatGLM-6B-QLoRA/chatglm-6b 用chatglm-6b-int4是跑不起来的,加载原始fp模型,加载过程做int4

然后要确保: 1、chatglm模型的资源(包括官方的remote_scripts等资源)都是最新的 2、各个库的版本保持一致

shuxueslpi commented 1 year ago

2023-09-11 09:47:27.394959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-18wy1blmurcx8 --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so... No compiled kernel found. Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c Compiling gcc -O3 -fPIC -std=c99 /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.c -shared -o /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so Load kernel : /root/.cache/huggingface/modules/transformers_modules/quantization_kernels.so Using quantization cache Applying quantization to glm layers 这个也是报显存错误吗?

这段里没有任何报错,都是warning……或者你没有复制全,没有复制到error部分