nebuly-ai / optimate

A collection of libraries to optimise AI model performances
https://www.nebuly.com/
Apache License 2.0
8.38k stars 642 forks source link

[Chatllama] train chatllama REWARD model using deepspeed ,got:RuntimeError: Found dtype Float but expected Half #275

Open balcklive opened 1 year ago

balcklive commented 1 year ago

{ "train_batch_size": 1, "gradient_accumulation_steps": 1, "optimizer": { "type": "Adam", "params": { "lr": 0.00015 } }, "fp16": { "enabled": true, "auto_cast": false, "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu" }, "contiguous_gradients": true, "overlap_comm": true }, "num_gpus": 1 } Using /home/ubuntu/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00028967857360839844 seconds Start Training the Reward Model Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. Traceback (most recent call last): File "artifacts/main.py", line 54, in reward_trainer.train() File "/home/ubuntu/.local/lib/python3.8/site-packages/chatllama/rlhf/reward.py", line 379, in train self.model_engine.backward(loss) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1964, in backward self.optimizer.backward(loss, retain_graph=retain_graph) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2028, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 54, in backward scaled_loss.backward(retain_graph=retain_graph) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Found dtype Float but expected Half

Is it a GPU memory not enough problem? any help would be appreciated.

PierpaoloSorbellini commented 1 year ago

Hi @balcklive I am not sure if it will solve your problem but you can try to modify the DeepSpeed’s config file: or

Thanks for your issue, let me know if it is working of if you are stuck with the same error.

balcklive commented 1 year ago

@PierpaoloSorbellini HI, thank you for your reply, I tried what you said.

  1. "enabled": false, (under fp16),: the error no longer appeared, but the GPU memory consumption is till high, I still can't train a opt-125m on a NVIDIA 4090
  2. "auto_cast": true: I got another error: Start Training the Reward Model Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation. [2023-03-20 09:28:07,680] [INFO] [scheduler.py:157:check_channel_pruning] Channel pruning is enabled at step 0 Traceback (most recent call last): File "artifacts/main.py", line 54, in reward_trainer.train() File "/home/ubuntu/.local/lib/python3.8/site-packages/chatllama/rlhf/reward.py", line 356, in train est_output = self.model_engine( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn ret_val = func(*args, *kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1832, in forward loss = self.module(inputs, kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "<@beartype(chatllama.rlhf.reward.RewardModel.forward) at 0x7fc893482b80>", line 51, in forward File "/home/ubuntu/.local/lib/python3.8/site-packages/chatllama/rlhf/reward.py", line 133, in forward output = self.model( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 848, in forward inputs_embeds = self.wte(input_ids) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.HalfTensor instead (while checking arguments for embedding)
PierpaoloSorbellini commented 1 year ago

@balcklive thanks for the quick reply. We have tested on similar settings (i.e. 3090) without problems for models that small using PR #233, consider lowering your batch size if cuda runs out of memory. If this happens at very low batch sizes, please send us more information so we can try to replicate the problem.

If you are only using one GPU, you can also disable DeepSpeed's or try Accelerate (which uses DeepSpeed's without having to manually configure it).

I am sorry that you are having problems, let us know if you are still having issues.

balcklive commented 1 year ago

My bachszie is always 1, I can't lower it anymore. Yes I am using one GPU, but my GPU memory capacity got 24G, I think it's supposed to be able to run a opt-125m, I can't figure out why this training procedure need that much GPU memory. If you think Accelerate could help, I would try it.

PierpaoloSorbellini commented 1 year ago

Hi @balcklive can you please send me the model that you used in actor, critic and reward (just the string used in the config.yaml) and which is the training procedure that fails (actor, RL or reward) ? I will try to replicate the setup on the same HW to see if I can replicate the problem and provide a solution.

balcklive commented 1 year ago

It's REWARD model, As I mentioned in issue #281 , my config file specify the model type as opt-125m, but it actually is a gpt2 model. Deepspeed module can compress the model size efficiently, hope you can fix it.

PierpaoloSorbellini commented 1 year ago

Hi @balcklive yes this problem with half precision should be fixed in #306 Keep me posted if you are still have the same issue!