takuseno / d3rlpy

An offline deep reinforcement learning library
https://takuseno.github.io/d3rlpy
MIT License
1.32k stars 242 forks source link

[QUESTION] Observation space support #259

Open N0Dr4m4L4m4 opened 1 year ago

N0Dr4m4L4m4 commented 1 year ago

Hey there,

i am trying to develop my own gym env and i want to use D3RL algorithms. I am using Dict() observation space, but like i notice it is not feasible? Does D3RL support Dict observation spaces?

base.py of D3RL

def build_with_env(self, env: gym.Env) -> None:
        """Instantiate implementation object with OpenAI Gym object.

        Args:
            env: gym-like environment.

        """

        observation_shape = env.observation_space.shape
        self.create_impl(
            self._process_observation_shape(observation_shape),
            get_action_size_from_env(env),
        )

trying to get the shape, obv Dict() has no shape

env.py

my obs. space look like that

               self.observation_space = spaces.Dict(
                {
                    "agent": spaces.Box(
                        low = np.array([0,0]),
                        high = np.array([self.MAX_X, self.MAX_Y])),
                    "target": spaces.Box(
                        low = np.array([0,0]),
                        high = np.array([self.MAX_X, self.MAX_Y])),
                }
            )
....
takuseno commented 1 year ago

@N0Dr4m4L4m4 Thanks for the issue. Currently, dictionary observation is not supported. One thing you can do is to concatenate all observations in a single vector.

Alternatively, I'm working on the next major update that supports tuple observation. It'll take some time until the release, but it's going to be available.

N0Dr4m4L4m4 commented 1 year ago

Hey there,

thanks for the hint. Obv i am using one single vector for my observation. One other thing is not working actually... I am using a SAC for continous control. My action space is defined as self.action_space = spaces.Box(low=0, high=360, dtype=np.int32) . But when the action is generated in iterator.py line 216 action = algo.sample_action([fed_observation])[0] i am getting an output around -1.0 up to 1.0 in floating point..... any advice for that? Should be between 0 and 360. Thanks :)