ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.11k stars 5.6k forks source link

Migrating from ModelV2 to RLModule, but batch['obs'] in forward_train() isn't getting value from env.step()/env.reset() #40919

Open caorantj opened 10 months ago

caorantj commented 10 months ago

Hi, I recently started to migrate my ModelV2 custom model into RLModule as it seems to provide better stability. So I set up the SingleAgentRLModuleSpec() in config and enabled the flags mentioned in the doc, and made my custom subclass RLmodule (APPO)

I think these modules are hooked up right but I kept getting a placeholder batch['obs'] into my forward pass instead of the real observation. The placeholder one doesn't look like anything from my environment. So of course I'm getting errors during the forward pass because they don't match my layer shape, but also they are all 0s.

I have logged relevant places like the obs returned by the env.step() or env.reset(), they looks right. (also thing were working with ModelV2 version) and I have logged the observation space being passed into my custom RLModule, and it also looks fine.

Since the interaction between env and rlmodule is behind the scene, is there anywhere you'd point me to look at to troubleshoot this disconnect between obs and batch['obs'] This happens during the warmup period I think before the experiment actually runs. But since I haven't been able to pass that, I can't confirm it worked at all

Versions / Dependencies

ray 2.7.1 python 3.8.18

Reproduction script

I don't have a reproduction script since if I put everything in one script I'd hook the obs from env into batch['obs'] myself like all the sample code RLmodule doc provided

Issue Severity

High: It blocks me from completing my task.

jfurches commented 10 months ago

From what I understand (I've also been trying to get an RLModule to work on PPO), RLlib flattens observations like how you describe to just a box, and before training I believe it passes in dummy batches of data (all 0s, in the shape of your observation space). I don't know if this is the best way of solving the problem, but in my code I do

from ray.rllib.models.modelv2 import restore_original_dimensions

class MyRLModule(TorchPPORLModule):
    def _forward_train(self, batch: NestedDict, compute_vf=True) -> Mapping[str, Any]:
        output = {}

        # Unpack our observation and restore it to the proper shape
        obs = batch[SampleBatch.OBS]
        obs = restore_original_dimensions(
            obs,
            self.config.observation_space.original_space,
            tensorlib=torch
        )

        input_dict = {
            SampleBatch.OBS: obs,
            STATE_IN: batch[STATE_IN]
        }

        # process input_dict as you'd expect