Closed wac81 closed 1 year ago
I tested on Colab and everything worked fine. It looks like you're using bf16. May I know what model you're using?
i use this model: checkpoint = "bigscience/bloom-560m"
and if i use gp2, i get new ERROR like this:
actions = torch.tensor([b["action"] for b in dataset], device=device)
Traceback (most recent call last):
File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/wac/.pyenv/versions/3.8.12/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/wac/.vscode-server/extensions/ms-python.python-2022.8.1/pythonFiles/lib/python/debugpy/main.py", line 45, in
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], device='cuda:0',
grad_fn=<SubBackward0>)
i follow your elon musk example.
be careful to the learning rate when fine-tuning via RL, setting a lower learning rate should be helpful here is the colab example, both model are working:
colab example: bigscience/bloom-560m
colab exmaple: huggingtweets/elonmusk
thank you, i found out my error from load model with causlLM loader
Traceback (most recent call last): File "/data/TextRL/train2.py", line 46, in
pfrl.experiments.train_agent_with_evaluation(
File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/experiments/train_agent.py", line 208, in train_agent_with_evaluation
eval_stats_history = train_agent(
File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/experiments/train_agent.py", line 57, in train_agent
action = agent.act(obs)
File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agent.py", line 161, in act
return self.batch_act([obs])[0]
File "/data/TextRL/textrl/actor.py", line 216, in batch_act
return self._batch_act_train(batch_obs)
File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/agents/ppo.py", line 735, in _batch_act_train
action_distrib, batch_value = self.model(b_state)
File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/nn/branched.py", line 30, in forward
return tuple(mod(*args, *kwargs) for mod in self.child_modules)
File "/data/TextRL/env/lib/python3.8/site-packages/pfrl/nn/branched.py", line 30, in
return tuple(mod( args, kwargs) for mod in self.child_modules)
File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/data/TextRL/env/lib/python3.8/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(args, kwargs)
File "/data/TextRL/env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found BFloat16