voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
545 stars 60 forks source link

i get error when i use elon example #12

Closed wac81 closed 1 year ago

wac81 commented 1 year ago

Traceback (most recent call last): File "/data/TextRL/train2.py", line 46, in pfrl.experiments.train_agent_with_evaluation( File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/experiments/train_agent.py", line 208, in train_agent_with_evaluation eval_stats_history = train_agent( File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/experiments/train_agent.py", line 57, in train_agent action = agent.act(obs) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agent.py", line 161, in act return self.batch_act([obs])[0] File "/data/TextRL/textrl/actor.py", line 216, in batch_act return self._batch_act_train(batch_obs) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agents/ppo.py", line 735, in _batch_act_train action_distrib, batch_value = self.model(b_state) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/nn/branched.py", line 30, in forward return tuple(mod(*args, *kwargs) for mod in self.child_modules) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/nn/branched.py", line 30, in return tuple(mod(args, kwargs) for mod in self.child_modules) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/data/TextRL/env/lib/python3.8/site-packages/accelerate/hooks.py", line 158, in new_forward output = old_forward(args, kwargs) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: expected scalar type Float but found BFloat16

voidful commented 1 year ago

I tested on Colab and everything worked fine. It looks like you're using bf16. May I know what model you're using?

wac81 commented 1 year ago

i use this model: checkpoint = "bigscience/bloom-560m"

wac81 commented 1 year ago

and if i use gp2, i get new ERROR like this: actions = torch.tensor([b["action"] for b in dataset], device=device) Traceback (most recent call last): File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wac/.vscode-server/extensions/ms-python.python-2022.8.1/pythonFiles/lib/python/debugpy/main.py", line 45, in cli.main() File "/home/wac/.vscode-server/extensions/ms-python.python-2022.8.1/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File "/home/wac/.vscode-server/extensions/ms-python.python-2022.8.1/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("main")) File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/TextRL/train_bloom.py", line 47, in agent.observe(obs, reward, done, reset) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agent.py", line 164, in observe self.batch_observe([obs], [reward], [done], [reset]) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agents/ppo.py", line 684, in batch_observe self._batch_observe_train(batch_obs, batch_reward, batch_done, batch_reset) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agents/ppo.py", line 810, in _batch_observe_train self._update_if_dataset_is_ready() File "/data/TextRL/textrl/actor.py", line 194, in _update_if_dataset_is_ready self._update(dataset) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agents/ppo.py", line 490, in _update distribs, vs_pred = self.model(states) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/nn/branched.py", line 30, in forward return tuple(mod(*args, *kwargs) for mod in self.child_modules) File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/nn/branched.py", line 30, in return tuple(mod(args, kwargs) for mod in self.child_modules) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "/data/TextRL/textrl/actor.py", line 153, in forward return torch.distributions.Categorical(logits=logits) File "/data/TextRL/env/lib/python3.8/site-packages/torch/distributions/categorical.py", line 66, in init super(Categorical, self).init(batch_shape, validate_args=validate_args) File "/data/TextRL/env/lib/python3.8/site-packages/torch/distributions/distribution.py", line 56, in init raise ValueError( ValueError: Expected parameter logits (Tensor of shape (3, 2, 50257)) of distribution Categorical(logits: torch.Size([3, 2, 50257])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]],

    [[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]]], device='cuda:0',
   grad_fn=<SubBackward0>)
wac81 commented 1 year ago

i follow your elon musk example.

voidful commented 1 year ago

be careful to the learning rate when fine-tuning via RL, setting a lower learning rate should be helpful here is the colab example, both model are working:

colab example: bigscience/bloom-560m

colab exmaple: huggingtweets/elonmusk

wac81 commented 1 year ago

thank you, i found out my error from load model with causlLM loader