ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.11k stars 5.6k forks source link

Windows: rllib RNN assert seq_lens is not None #10448

Closed pbosch closed 1 year ago

pbosch commented 4 years ago

What is the problem?

Using the option "use_lstm" = True ends in an assertion error. It appears that the model is always called with sequence length None. I'm not sure if this is a bug, but according to the documentation adding this option should just wrap the model with an LSTM cell. Weirdly enough it also happens with the cartpole environment. Is this intended behaviour?

`Traceback (most recent call last): File "D:/Seafile/Programming projects/rl_trading/test.py", line 44, in trainer = sac.SACTrainer(config=config, env="CartPole-v0") File "C:\Python38\lib\site-packages\ray\rllib\agents\trainer_template.py", line 88, in init Trainer.init(self, config, env, logger_creator) File "C:\Python38\lib\site-packages\ray\rllib\agents\trainer.py", line 479, in init super().init(config, logger_creator) File "C:\Python38\lib\site-packages\ray\tune\trainable.py", line 245, in init self.setup(copy.deepcopy(self.config)) File "C:\Python38\lib\site-packages\ray\rllib\agents\trainer.py", line 643, in setup self._init(self.config, self.env_creator) File "C:\Python38\lib\site-packages\ray\rllib\agents\trainer_template.py", line 101, in _init self.workers = self._make_workers( File "C:\Python38\lib\site-packages\ray\rllib\agents\trainer.py", line 708, in _make_workers return WorkerSet( File "C:\Python38\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 66, in init self._local_worker = self._make_worker( File "C:\Python38\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 259, in _make_worker worker = cls( File "C:\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 403, in init self._build_policy_map(policy_dict, policy_config) File "C:\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 986, in _build_policy_map policy_map[name] = cls(obs_space, act_space, merged_conf) File "C:\Python38\lib\site-packages\ray\rllib\policy\tf_policy_template.py", line 132, in init DynamicTFPolicy.init( File "C:\Python38\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py", line 236, in init action_distribution_fn( File "C:\Python38\lib\site-packages\ray\rllib\agents\sac\sac_tf_policy.py", line 108, in get_distribution_inputs_and_class model_out, state_out = model({ File "C:\Python38\lib\site-packages\ray\rllib\models\modelv2.py", line 202, in call res = self.forward(restored, state or [], seq_lens) File "C:\Python38\lib\site-packages\ray\rllib\models\tf\recurrent_net.py", line 157, in forward assert seq_lens is not None AssertionError

Process finished with exit code 1 `

Ray version and other system information (Python version, TensorFlow version, OS): Python 3.8.5 TensorFlow 2.3 Windows 10

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

`import ray from ray.rllib.agents import sac

config = sac.DEFAULT_CONFIG.copy() config["num_gpus"] = 1 config["num_workers"] = 1 config["framework"] = "tf" config["model"]["use_lstm"] = True

ray.init(include_dashboard=False)

trainer = sac.SACTrainer(config=config, env="CartPole-v0") for i in range(10): result = trainer.train()`

sven1977 commented 4 years ago

Hmm, sorry, SAC does not currently support auto LSTM wrapping via use_lstm=True, but we should definitely put an error in SAC's validate config method.

pbosch commented 4 years ago

Ah, I checked the documentation again and indeed not all are listed as being compatible. But I think it is in the configuration by default if you print it. A warning would definitely be useful. I'm not sure how it is handled with the other incompatible ones.

evo11x commented 2 years ago

I get the same error on Windows 11 and Debian 11, Python 3.9, ray 1.9.2 and1.13.0, with APPOTrainer

ray\rllib\models\torch\recurrent_net.py", line 185, in forward assert seq_lens is not None AssertionError

The training is working, I get the error on compute_single_action

I get the error when I specify this training configuration cfg["model"] = {"use_lstm": True, "fcnet_hiddens": [512, 512, 512], }

jobeid1 commented 2 years ago

@evo11x I'm having the same issue with the attention net using a PPOTrainer. Did you find a solution?

evo11x commented 2 years ago

@jobeid1 Sorry for my previous answer, I thought the question was about other problem. No I did not found a solution, did you find one?

jobeid1 commented 2 years ago

@evo11x My colleague discovered that RLLib doesn't automatically initialize state or update state when using a trainer with attention or lstm as it does when using tune. It really ought to, and I think they have plans to make this easier in version 2.0. You will need to initialize the state and then update the state at each step.

evo11x commented 2 years ago

@jobeid1 the training restore and re-train is working, the error appears when I restore a trained agent and then I get the error at compute_single_action How can I initialize and update the state?

jobeid1 commented 2 years ago

@evo11x when using attention, something along these lines should work:

transformer_attention_size = policy_config[3]["model"]["attention_dim"]
transformer_memory_size = policy_config[3]["model"]["attention_memory_inference"]
transformer_layer_size = np.zeros([transformer_memory_size, transformer_attention_size])
transformer_length = policy_config[3]["model"]["attention_num_transformer_units"]
state_list = transformer_length * [transformer_layer_size]
initial_state_list = state_list

for agent in env.agent_iter():
     policy_key = agent_id_to_policy_key(agent)
     observation, reward, done, info = env.last()

     if done:
        action = None
        state_list = initial_state_list
        break
     policy_config = config["multiagent"]["policies"][policy_key]
     policy = PPOagent.get_policy(policy_key)
     action, next_state, _ = policy.compute_single_action(obs=observation, state=state_list)
     state_list = [np.concatenate((state_list[i], [next_state[i]]))[1:] for i in range(transformer_length)]

     env.step(action)

The code may require minor adjustments to work and obviously will if you want to use LSTM. Hope this helps!

davidlarcher commented 2 years ago

hello, I have exactly the same thing during inference time with a PPO trainer, It works fine during training but fails on inference... thank you @jobeid1 for your code snippet, I can't seem to make it work on my side. Is it mandatory to access to the policy of the agent before executing the step method ? I usually do agent.compute_single_action directly I have tried the rc from ray (ray==2.0.0rc0) but the problem still remains :( Does the ray team have this on the radar ? All the best and thank you for your contributions !

evo11x commented 2 years ago

@jobeid1 thanks! I don't know if I can figure out how make it work, but I will try

tokarev-i-v commented 2 years ago

Hello! Have same problem with PPOTrainer with use_lstm=True

jiahao-shen commented 1 year ago

Same problem with PPOTrainer with use_lstm=True

hokhay commented 1 year ago

I have the exact same problem and it seems have not solved for 2 years. Can anyone explain what is the cause of this issue and how to solve it? Thanks

eidelen commented 1 year ago

I installed the latest nightly build (3.0.0.dev0) and I observe the same error: Training is fine, but an error when calling compute_single_action. @avnishn From above message I am under the impression, that you fixed this issue. Is there a branch or commit with the fix? Thank you.

FunkyungJz commented 1 year ago

@eidelen https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training_with_lstm.py

This example code may be helpful.

Besides, I met this error because parameters in algo.compute_single_action(prev_action=prev_a_ndarray) give wrong shape. My action space is Box(7), but gave tenser([0]). Correct to tensor([0,0,0,0,0,0,0]) make everything work.

lyzyn commented 8 months ago

我在 Windows 11 和 Debian 11、Python 3.9、ray 1.9.2 和 1.13.0 上遇到同样的错误,使用 APPOTrainer

ray\rllib\models\torch\recurrent_net.py“,第 185 行,在前向断言中seq_lens不是 None AssertionError

培训正在工作,我在compute_single_action上收到错误

当我指定此训练配置时出现错误 cfg[“model”] = {“use_lstm”: True, “fcnet_hiddens”: [512, 512, 51 请问您解决这个问题了吗?我和您遇到了一样的问题,谢谢!

lyzyn commented 8 months ago

我安装了最新的夜间版本(3.0.0.dev0),并观察到相同的错误:训练很好,但调用compute_single_action时出现错误。 从上面的消息中,我的印象是,您解决了这个问题。 是否有分支或提交修复程序?谢谢。

Have you solved this problem?Thank you!

lyzyn commented 8 months ago

我有完全相同的问题,似乎已经 2 年没有解决。谁能解释一下这个问题的原因是什么以及如何解决它?谢谢

Have you solved this problem?Thank you!

lyzyn commented 8 months ago

嗯,对不起,SAC 目前不支持自动 LSTM 包装,但我们绝对应该在 SAC 的验证配置方法中放置一个错误。use_lstm=True

But now there is also this problem in PPO, how should we solve it?Thank you!