Traceback (most recent call last): File "merge_peft_adapter.py", line 110, in <module> main() File "merge_peft_adapter.py", line 93, in main lora_model = PeftModel.from_pretrained( File "/home/largitdata/miniconda3/envs/chatglm/lib/python3.8/site-packages/peft/peft_model.py", line 271, in from_pretrained model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) File "/home/largitdata/miniconda3/envs/chatglm/lib/python3.8/site-packages/peft/peft_model.py", line 581, in load_adapter max_memory = get_balanced_memory( File "/home/largitdata/miniconda3/envs/chatglm/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 753, in get_balanced_memory per_gpu = module_sizes[""] [//](https://github.com/shibing624/MedicalGPT/issues/new?assignees=&labels=question&projects=&template=usage-question.md&title=) (num_devices - 1 if low_zero else num_devices) ZeroDivisionError: integer division or modulo by zero
經過一番搜尋後我嘗試修改了accelerate/utils folder in site-packages 的代碼,把原本的0替換成1
max_memory = {i: torch.cuda.mem_get_info(i)[1] for i in range(torch.cuda.device_count())}
merge_peft_adapter.py
在對 baichuan13B 做 merge的時候出現了
Traceback (most recent call last): File "merge_peft_adapter.py", line 110, in <module> main() File "merge_peft_adapter.py", line 93, in main lora_model = PeftModel.from_pretrained( File "/home/largitdata/miniconda3/envs/chatglm/lib/python3.8/site-packages/peft/peft_model.py", line 271, in from_pretrained model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) File "/home/largitdata/miniconda3/envs/chatglm/lib/python3.8/site-packages/peft/peft_model.py", line 581, in load_adapter max_memory = get_balanced_memory( File "/home/largitdata/miniconda3/envs/chatglm/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 753, in get_balanced_memory per_gpu = module_sizes[""] [//](https://github.com/shibing624/MedicalGPT/issues/new?assignees=&labels=question&projects=&template=usage-question.md&title=) (num_devices - 1 if low_zero else num_devices) ZeroDivisionError: integer division or modulo by zero
經過一番搜尋後我嘗試修改了accelerate/utils folder in site-packages 的代碼,把原本的0替換成1
max_memory = {i: torch.cuda.mem_get_info(i)[1] for i in range(torch.cuda.device_count())}
再次進行合併時報錯顯示內存不足,想知道在merge階段是否除了更換硬體設備之外別無他法了,merge不像sft的代碼可以利用qlora這選項節省記憶體空間. 我用的是3090 24G 謝謝!