shuxueslpi / chatGLM-6B-QLoRA

使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
356 stars 46 forks source link

微调chatglm-6b占用内存20g以上,什么原因 #17

Open sxm7078 opened 1 year ago

shuxueslpi commented 1 year ago

1代模型吗?你batchsize多少?

valkryhx commented 1 year ago

脚本参数 fp32 换成fp16 这样显存只有12.4G 再开quantinize = int4 来SFT 显存只用6.5G

sxm7078 commented 1 year ago

脚本参数 fp32 换成fp16 这样显存只有12.4G 再开quantinize = int4 来SFT 显存只用6.5G

已经是fp16了,内存占用不断变化,有时到30g

shuxueslpi commented 1 year ago

@sxm7078 有修改代码吗?因为代码里默认是int4的加载模型,显存占用应该很小的。 之前2代模型刚出来的时候没有实现activate checkpointing,导致显存占用很大,后来修复了,拉最新的模型就OK了

sxm7078 commented 1 year ago

@sxm7078 有修改代码吗?因为代码里默认是int4的加载模型,显存占用应该很小的。 之前2代模型刚出来的时候没有实现activate checkpointing,导致显存占用很大,后来修复了,拉最新的模型就OK 微调的chatglm-6b的模型,lora_rank改为8,compute_dtype改为fp16。transformers==4.30.2和accelerate==0.20.3安装的是这个版本,还是dev版本

shuxueslpi commented 1 year ago

@sxm7078 现在用这个环境transformers==4.30.2和accelerate==0.20.3,不用dev了,但我觉得你应该不是dev的版本问题

oyster-lab commented 1 year ago

我也是这样,使用的是Author的默认参数配置,用的4090卡,直接报CUDA out of memory,看了下报错信息,好像是在prepare_model_for_kbit_training函数里的 param.data = param.data.to(torch.float32)这一步出现显存占用异常 File "/home/lc/anaconda3/envs/pytorch/lib/python3.11/site-packages/peft/utils/other.py", line 81, in prepare_model_for_kbit_training param.data = param.data.to(torch.float32) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 428.00 MiB (GPU 0; 23.65 GiB total capacity; 21.69 GiB already allocated; 96.00 MiB free; 22.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

misener7 commented 1 year ago

我也是这样,使用的是Author的默认参数配置,用的4090卡,直接报CUDA out of memory,看了下报错信息,好像是在prepare_model_for_kbit_training函数里的 param.data = param.data.to(torch.float32)这一步出现显存占用异常 File "/home/lc/anaconda3/envs/pytorch/lib/python3.11/site-packages/peft/utils/other.py", line 81, in prepare_model_for_kbit_training param.data = param.data.to(torch.float32) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 428.00 MiB (GPU 0; 23.65 GiB total capacity; 21.69 GiB already allocated; 96.00 MiB free; 22.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问这个问题解决了吗?