I have CUDA Out of Memory Error:
Traceback (most recent call last):
File "/home/llamaFineTune/myenv/bin/autotrain", line 8, in
sys.exit(main())
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/autotrain/cli/autotrain.py", line 36, in main
command.run()
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/autotrain/cli/run_llm.py", line 489, in run
train_llm(params)
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/autotrain/trainers/clm.py", line 105, in train
model = AutoModelForCausalLM.from_pretrained(
File "/home/amg/llamaFineTune/myenv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 511, in from_pretrained
return model_class.from_pretrained(
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2859, in from_pretrained
max_memory = get_balanced_memory(
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 731, in get_balanced_memory
max_memory = get_max_memory(max_memory)
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 624, in get_maxmemory
= torch.tensor([0], device=i)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
My NVIDIA GPU is NVIDIA GeForce RTX 3060 Laptop GPU 6Gb.
When I launch: autotrain llm --train --project_name my-llm --model meta-llama/Llama-2-7b-hf --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 1 --num_train_epochs 1 --trainer sft
I have CUDA Out of Memory Error: Traceback (most recent call last): File "/home/llamaFineTune/myenv/bin/autotrain", line 8, in
sys.exit(main())
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/autotrain/cli/autotrain.py", line 36, in main
command.run()
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/autotrain/cli/run_llm.py", line 489, in run
train_llm(params)
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/autotrain/trainers/clm.py", line 105, in train
model = AutoModelForCausalLM.from_pretrained(
File "/home/amg/llamaFineTune/myenv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 511, in from_pretrained
return model_class.from_pretrained(
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2859, in from_pretrained
max_memory = get_balanced_memory(
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 731, in get_balanced_memory
max_memory = get_max_memory(max_memory)
File "/home/llamaFineTune/myenv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 624, in get_maxmemory
= torch.tensor([0], device=i)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.My NVIDIA GPU is NVIDIA GeForce RTX 3060 Laptop GPU 6Gb.
Any help would be appreciated. Thanks in advance.