Closed a-python-script closed 3 years ago
I found the problem. It was a small mistake on my part. The documentation says that the value_function should return a tensor of the shape [BATCH]
. However, my code shown above returns a tensor from shape [BATCH, 1]
. That's where the numpy error came from.
I am trying to get my own environment running with my own network/model under ray.rllib. The setup is minimal, the code is shown below. The environment is a small 10x10 grid world environment and the network has two hidden layers with policy and value output. I am using the PPO algorithm with the default configuration. I have followed the tutorials for a custom environment and network/model, yet I always get the error shown below. The error seems to occur when initializing the rollout worker. Since my setting is relatively simple and I followed the tutorials exactly, I suspect there is a bug here.
Stacktrace:
main.py
grid.py