Open HyeongminMoon opened 1 year ago
I used 8 v100 32g, which worked for 6.7b_lora config, but it still failed for 6.7b config in step 3. It seems a node of 8a100 40g is the minimal requirement, otherwise only 1.3b config could actually work.
@M1n9X Thank you for your answer.
But I still doubt why 6.7_lora.sh
was in the single_gpu
training script. If we finally figure out its minimum requirement is single node
not single gpu
, It would be better to move it.
Please try adding --offload_reference_model
to command line.
@tjruwase I tried with --offload_reference_model
but got same error.
I am trying to run DeepSpeed-Chat Example with single gpu, Nvidia A6000 48G.
I could run all 3 steps well using 1.3b example. But when I run
single_gpu/run_6.7b_lora.sh
, I got CUDA Out Of Memory error at step3. Step1 & step2 were run well.Even after I minimized configurations, I still get OOM. Here is my
run_6.7b_lora.sh
config:And I got OOM especially at
gradient_accumulation_steps
. Here is my error point:Environments
I also tried with
--only_optimize_lora
but got a same error. Is there any possible way to run 6.7b_lora model on 48G single gpu? Thank you for any help.