I am getting OOM with V100 (32GB), when I set batch_size in gtoolformer_gptj_script.py to 2
self.train_model(train_dataloader=self.data['train'])
File "/scratch/yerong/Graph_Toolformer/LLM_Tuning/code/Method_Graph_Toolformer_GPTJ.py", line 82, in train_model
self.optimizer.step()
File "/scratch/yerong/.conda/envs/gtool/lib/python3.9/site-packages/torch/optim/optimizer.py", line 140, in wrapper
out = func(*args, **kwargs)
File "/scratch/yerong/.conda/envs/gtool/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/scratch/yerong/.conda/envs/gtool/lib/python3.9/site-packages/bitsandbytes/optim/optimizer.py", line 261, in step
self.init_state(group, p, gindex, pindex)
File "/scratch/yerong/.conda/envs/gtool/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/scratch/yerong/.conda/envs/gtool/lib/python3.9/site-packages/bitsandbytes/optim/optimizer.py", line 391, in init_state
state["state1"] = torch.zeros_like(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 1; 31.75 GiB total capacity; 30.28 GiB already allocated;
Also I've got warnings on compute capability:
/scratch/yerong/.conda/envs/gtool/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
I am getting OOM with V100 (32GB), when I set batch_size in gtoolformer_gptj_script.py to 2
Also I've got warnings on compute capability: