voidful / TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
MIT License
545 stars 60 forks source link

AttributeError: 'MyRLEnv' object has no attribute 'num_envs' #6

Closed lucascassiano closed 1 year ago

lucascassiano commented 1 year ago

Issue

I got the error AttributeError: 'MyRLEnv' object has no attribute 'num_envs'. What num_envs should be in this case? A function that returns 1?

Environment

python: Python 3.10.6
textRL: textrl==0.1.9
OS: Ubuntu 22.04.1 LTS

Executed code

import pfrl
from textrl import TextRLEnv, TextRLActor
from transformers import AutoModelForCausalLM, AutoTokenizer

# checkpoint = "bigscience/bloomz-7b1-mt"
checkpoint = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint, torch_dtype="auto", device_map="auto")

model = model.cuda()

class MyRLEnv(TextRLEnv):
    # predicted will be the list of predicted token
    def get_reward(self, input_item, predicted_list, finish):
        reward = 0
        if finish:
            reward = len(predicted_list)
        return reward

observaton_list = [["explain how attention work in seq2seq model"]]
env = MyRLEnv(model, tokenizer, observation_input=observaton_list)
actor = TextRLActor(env, model, tokenizer, act_deterministically=True)
agent = actor.agent_ppo(update_interval=2, minibatch_size=2, epochs=10)

print(actor.predict(observaton_list[0]))

pfrl.experiments.train_agent_batch_with_evaluation(
    agent,
    env,
    steps=100,
    eval_n_steps=None,
    eval_n_episodes=1,
    eval_interval=2,
    outdir='bloom—test',
)

print(actor.predict(observaton_list[0]))

Traceback

Traceback (most recent call last):
  File "{...}/main.py", line 35, in <module>
    pfrl.experiments.train_agent_batch_with_evaluation(
  File "{...}/lib/python3.10/site-packages/pfrl/experiments/train_agent_batch.py", line 247, in train_agent_batch_with_evaluation
    eval_stats_history = train_agent_batch(
  File "{...}lib/python3.10/site-packages/pfrl/experiments/train_agent_batch.py", line 51, in train_agent_batch
    num_envs = env.num_envs
AttributeError: 'MyRLEnv' object has no attribute 'num_envs'
lucascassiano commented 1 year ago

Adding:

class MyRLEnv(TextRLEnv):
    num_envs = 1

Solved the previous issue, however I got a even worse error:

Traceback (most recent call last):
  File "{...}/main.py", line 41, in <module>
    pfrl.experiments.train_agent_batch_with_evaluation(
  File "{...}lib/python3.10/site-packages/pfrl/experiments/train_agent_batch.py", line 247, in train_agent_batch_with_evaluation
    eval_stats_history = train_agent_batch(
  File "{...}lib/python3.10/site-packages/pfrl/experiments/train_agent_batch.py", line 67, in train_agent_batch
    actions = agent.batch_act(obss)
  File "{...}lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "{...}lib/python3.10/site-packages/textrl/actor.py", line 115, in batch_act
    return self._batch_act_train(batch_obs)
  File "{...}lib/python3.10/site-packages/pfrl/agents/ppo.py", line 735, in _batch_act_train
    action_distrib, batch_value = self.model(b_state)
  File "{...}lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "{...}lib/python3.10/site-packages/pfrl/nn/branched.py", line 30, in forward
    return tuple(mod(*args, **kwargs) for mod in self.child_modules)
  File "{...}lib/python3.10/site-packages/pfrl/nn/branched.py", line 30, in <genexpr>
    return tuple(mod(*args, **kwargs) for mod in self.child_modules)
  File "{...}lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "{...}lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "{...}lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "{...}lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "{...}lib/python3.10/site-packages/textrl/actor.py", line 163, in forward
    return torch.distributions.Categorical(probs=softmax(logits / temperature))
  File "{...}lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "{...}lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1390, in forward
    return F.softmax(input, self.dim, _stacklevel=5)
  File "{...}lib/python3.10/site-packages/torch/nn/functional.py", line 1841, in softmax
    ret = input.softmax(dim)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Any ideas on how to solve this?

voidful commented 1 year ago

I mistakenly use pfrl.experiments.train_agent_batch_with_evaluation instead of pfrl.experiments.train_agent_with_evaluation

That is use for batch training, i am still testing that part.

It should be corrected.