Open balcklive opened 1 year ago
Hi @balcklive I am not sure if it will solve your problem but you can try to modify the DeepSpeed’s config file: or
Thanks for your issue, let me know if it is working of if you are stuck with the same error.
@PierpaoloSorbellini HI, thank you for your reply, I tried what you said.
@balcklive thanks for the quick reply. We have tested on similar settings (i.e. 3090) without problems for models that small using PR #233, consider lowering your batch size if cuda runs out of memory. If this happens at very low batch sizes, please send us more information so we can try to replicate the problem.
If you are only using one GPU, you can also disable DeepSpeed's or try Accelerate (which uses DeepSpeed's without having to manually configure it).
I am sorry that you are having problems, let us know if you are still having issues.
My bachszie is always 1, I can't lower it anymore. Yes I am using one GPU, but my GPU memory capacity got 24G, I think it's supposed to be able to run a opt-125m, I can't figure out why this training procedure need that much GPU memory. If you think Accelerate could help, I would try it.
Hi @balcklive can you please send me the model that you used in actor, critic and reward (just the string used in the config.yaml) and which is the training procedure that fails (actor, RL or reward) ? I will try to replicate the setup on the same HW to see if I can replicate the problem and provide a solution.
It's REWARD model, As I mentioned in issue #281 , my config file specify the model type as opt-125m, but it actually is a gpt2 model. Deepspeed module can compress the model size efficiently, hope you can fix it.
Hi @balcklive yes this problem with half precision should be fixed in #306 Keep me posted if you are still have the same issue!
{ "train_batch_size": 1, "gradient_accumulation_steps": 1, "optimizer": { "type": "Adam", "params": { "lr": 0.00015 } }, "fp16": { "enabled": true, "auto_cast": false, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu" }, "contiguous_gradients": true, "overlap_comm": true }, "num_gpus": 1 } Using /home/ubuntu/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00028967857360839844 seconds Start Training the Reward Model Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. Traceback (most recent call last): File "artifacts/main.py", line 54, in
reward_trainer.train()
File "/home/ubuntu/.local/lib/python3.8/site-packages/chatllama/rlhf/reward.py", line 379, in train
self.model_engine.backward(loss)
File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1964, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2028, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 54, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Found dtype Float but expected Half
Is it a GPU memory not enough problem? any help would be appreciated.