voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
539 stars 60 forks source link

AssertionError #9

Closed Ulov888 closed 1 year ago

Ulov888 commented 1 year ago

具体错误信息

Traceback (most recent call last):
  File "/home/ll_coder/workspace/Aigc/RLHF.py", line 36, in <module>
    pfrl.experiments.train_agent_with_evaluation(
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/experiments/train_agent.py", line 208, in train_agent_with_evaluation
    eval_stats_history = train_agent(
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/experiments/train_agent.py", line 57, in train_agent
    action = agent.act(obs)
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/agent.py", line 161, in act
    return self.batch_act([obs])[0]
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/textrl/actor.py", line 163, in batch_act
    return self._batch_act_train(batch_obs)
  File "/home/ll_coder/anaconda3/envs/py39/lib/python3.9/site-packages/pfrl/agents/ppo.py", line 721, in _batch_act_train
    assert len(self.batch_last_action) == num_envs
AssertionError

我确信环境按照Readme安装,在跑example 1的时候总是报这个错误,请问有遇到过类似问题吗?

Ulov888 commented 1 year ago

@voidful

xzdong-2019 commented 1 year ago

i have the same question

voidful commented 1 year ago

It is a issue related to the mismatch of distribution, i change it to categorial back. Also, we should return reward on every sample on ranking stage.

All the issue should be fixed right now. I will try to add testing in the project.

(應該是distribution的shape不對導致的,我重新修改這部分的code,現在應該正常了。