4bit-QLora + Qwen2 72b + 16k cutoff_len

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

https://arxiv.org/abs/2403.13372

Apache License 2.0

34.47k stars 4.25k forks source link

Open lmc8133 opened 3 weeks ago

lmc8133 commented 3 weeks ago

How many gpus are needed to finetune? I have tried 16 PPUs (96GB each) but got CUDA OUT OF MEMROY

hiyouga commented 3 weeks ago

try --enable_liger_kernel and --use_unsloth_gc

lmc8133 commented 3 weeks ago

try --enable_liger_kernel and --use_unsloth_gc

--use_unsloth_gc or --use_unsloth?

hiyouga commented 3 weeks ago

use_unsloth_gc

lmc8133 commented 3 weeks ago

use_unsloth_gc

Thanks.

BTW, I have encounter an error : Triton Error [CUDA]: device kernel image is invalid when --enable_liger_kernel.

Here are some pkg info: triton==3.1.0 transformers==4.44.2 torch=2.3.0 CUDA SDK == 12.3.2

Any suggestions?