Open simonsays1980 opened 1 year ago
I can reproduce and am attempting a fix :) Thanks for reporting this!
The issue is that the AgentCollector is used in two settings:
The unroll ID is stores as a class attribute that is shared between these settings. But only the episodes samples in the EnvRunnerV2 are sampled from the RW.
@simonsays1980 Your expected behaviour is obviously that sampling from the RW should yield continuous unroll IDs, right? But RLlib does not give guarantees around this. And they are still suitable to disambiguate between unrolls.
@sven1977 Can you advise? Do we want to treat this as a non-issue?
@ArturNiederfahrenhorst Thanks for the explanation. And good to hear that this does not point to sth being off on user side. I just wondered and digged in a bit.
I wonder, if it is then even needed that the unroll_id
s are consecutive in nature or could also be made unique by a simple code like the episode_id
s? If, however, they need to be consecutive, missing numbers could make a difference.
Good catch @simonsays1980 , and thanks for the analysis @ArturNiederfahrenhorst . It's probably not a P1, but we should fix this. Maybe we should move this out of AgentCollector altogether and only use this in rollout worker directly?
@simonsays1980 We are aiming to rewrite the sampling backend of rllib to provide more flexibility to write your own sampling logic. We will keep this in the back of our heads!
I've created a new label "rllib-samplingbackend" to track these. Over time, I'll collect such issues that we should consider on a rewrite.
What happened + What you expected to happen
What happened
I sampled from an
env
using theRolloutWorker
andbatch_mode="complete_episodes"
. The batches returned possessed only odd number.s in theirunroll_id
. I don't think that a batch was missing, but I have the feeling that theunroll_id
gets incremented two times in theAgentCollector
when addinginit_obs
.I do not know if this is even intentional, but it makes the user think that some data got missing in sampling.
Debugging pointed me to the two lines:
policy.agent_connectors(acd_list)
episode.add_init_obs()
What you expected to happen
That
unroll_id
in the batches are incremented by a single step, like1,2,3,4,...
.Versions / Dependencies
Ray 2.6.0 Python 3.9.12 Fedora 37
Reproduction script
Issue Severity
Low: It annoys or frustrates me.