Closed ChangyuChen347 closed 5 months ago
I met the same error.Check your log ,do you import fast_jl successfully? If it prints "Use basic projector " then fast_jl didn't import actually.When generating eval_grads, cuda number 0 was used, and the model&dataloader preempted the graphics memory, resulting in an OOM error during loss calculation.You may move the grads to another available in collect_grad_reps.py;
I met the same error.Check your log ,do you import fast_jl successfully? If it prints "Use basic projector " then fast_jl didn't import actually.When generating eval_grads, cuda number 0 was used, and the model&dataloader preempted the graphics memory, resulting in an OOM error during loss calculation.You may move the grads to another available in collect_grad_reps.py;
Thanks. After using the CUDA projector, there are no more OOM issues
@Haruka1307 Have you ever fixed this issue before OOM in step 3.1 ? https://github.com/princeton-nlp/LESS/issues/19#issue-2317088583 Your solution doesn't work in this situation. Any advice for this?
Hello, I encounter an Out of Memory error at step 3.1. My configuration includes a single A100 GPU with 80GB of memory. Reducing the max_length to 128 allows me to avoid the OOM error. I would like to know if this is a reasonable approach and if there are any other methods to resolve this issue.
Here is the command I am using: CUDA_VISIBLE_DEVICES=0 python3 -m less.data_selection.get_info \ --task $task \ --info_type grads \ --model_path $model \ --output_path $output_path \ --gradient_projection_dimension $dims \ --gradient_type sgd \ --data_dir $data_dir \ --max_length 128