KeyError: 'weights' in RLlib using async_replay_optimizer

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

34.29k stars 5.83k forks source link

KeyError: 'weights' in RLlib using async_replay_optimizer #4730

Closed zhan0903 closed 4 years ago

zhan0903 commented 5 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ray installed from (source or binary):
Ray version:
Python version:
Exact command to reproduce:

Describe the problem

I met a KeyValue error just showed below. Why the sample batches of experiences have the attribute "weights"? Thanks.

Source code / logs

In async_replay_optimizer.py", in add_batch
    row["new_obs"], row["dones"], row["weights"])
KeyError: 'weights'

ericl commented 5 years ago

Those are the importance weights for prioritized replay -- you can set those to all 1s if not sure.

Btw, how did you come upon this error?

zhan0903 commented 5 years ago

Those are the importance weights for prioritized replay -- you can set those to all 1s if not sure.

Btw, how did you come upon this error?

I think it is because there is something wrong with my DDPG pytorch graph implementation. So how to implement the weights for prioritized replay using the pytorch? Thanks.

ericl commented 5 years ago

The prio weights are added by DDPGPostprocessing, which calls _postprocess_dqn(), in this function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/agents/dqn/dqn_policy_graph.py#L636

You should make sure your custom policy graph also adds the "weights" column to the batch in your graph postprocess function. It could just be batch[PRIO_WEIGHTS] = np.ones_like(batch[SampleBatch.REWARDS]).

zhan0903 commented 5 years ago

The prio weights are added by DDPGPostprocessing, which calls _postprocess_dqn(), in this function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/agents/dqn/dqn_policy_graph.py#L636

You should make sure your custom policy graph also adds the "weights" column to the batch in your graph postprocess function. It could just be batch[PRIO_WEIGHTS] = np.ones_like(batch[SampleBatch.REWARDS]).

Thanks. It works.

stale[bot] commented 4 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 4 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!