Open caorantj opened 10 months ago
From what I understand (I've also been trying to get an RLModule to work on PPO), RLlib flattens observations like how you describe to just a box, and before training I believe it passes in dummy batches of data (all 0s, in the shape of your observation space). I don't know if this is the best way of solving the problem, but in my code I do
from ray.rllib.models.modelv2 import restore_original_dimensions
class MyRLModule(TorchPPORLModule):
def _forward_train(self, batch: NestedDict, compute_vf=True) -> Mapping[str, Any]:
output = {}
# Unpack our observation and restore it to the proper shape
obs = batch[SampleBatch.OBS]
obs = restore_original_dimensions(
obs,
self.config.observation_space.original_space,
tensorlib=torch
)
input_dict = {
SampleBatch.OBS: obs,
STATE_IN: batch[STATE_IN]
}
# process input_dict as you'd expect
Hi, I recently started to migrate my
ModelV2
custom model intoRLModule
as it seems to provide better stability. So I set up theSingleAgentRLModuleSpec()
in config and enabled the flags mentioned in the doc, and made my custom subclass RLmodule (APPO)I think these modules are hooked up right but I kept getting a placeholder batch['obs'] into my forward pass instead of the real observation. The placeholder one doesn't look like anything from my environment. So of course I'm getting errors during the forward pass because they don't match my layer shape, but also they are all 0s.
I have logged relevant places like the
obs
returned by theenv.step()
orenv.reset()
, they looks right. (also thing were working with ModelV2 version) and I have logged the observation space being passed into my custom RLModule, and it also looks fine.Since the interaction between env and rlmodule is behind the scene, is there anywhere you'd point me to look at to troubleshoot this disconnect between obs and batch['obs'] This happens during the warmup period I think before the experiment actually runs. But since I haven't been able to pass that, I can't confirm it worked at all
Versions / Dependencies
ray 2.7.1 python 3.8.18
Reproduction script
I don't have a reproduction script since if I put everything in one script I'd hook the obs from env into batch['obs'] myself like all the sample code RLmodule doc provided
Issue Severity
High: It blocks me from completing my task.