pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.22k stars 292 forks source link

[Feature Request] Tutorial for custom env with complex shapes #1896

Open svnv-svsv-jm opened 7 months ago

svnv-svsv-jm commented 7 months ago

🚀 The feature, motivation and pitch

I am totally unable to create a EnvBase subclass, where the *_spec attribute have complex shapes.

For example, I have a state with shape (8,8,13), what shape/batch size should i give to the observation_spec? If I have an action of shape (8,18), what value to the batch size of action_spec?

        self.action_spec = BoundedTensorSpec(
            minimum=0,
            maximum=1,
            shape=action_space.size(),  # `action_space` is a (N,) one-hot tensor
            dtype=self.dtype,
        )

        observation_spec = BoundedTensorSpec(
            low=0,
            high=1,
            shape=state.size(),  # `state` is a 8x8x13 tensor
            dtype=self.dtype,
        )
        self.observation_spec = CompositeSpec(observation=observation_spec)

Would this work?

Solution

A tutorial where the shape attribute/argument is better explored will suffice. Just give examples of all edge cases, for how to use shape.

vmoens commented 7 months ago

We can document this better. In general the idea is that your composite spec has the shape of the batch size (it can be empty) and the leaves have that size plus their feature size.

Examples: Your env has a single agent, no batch-size. full_composite_spec has shape [] Its leaf has the shape of the feature (eg [3, 64, 64] if you have an image of 64 pixels width/height).

If you have a batched env with 2 agents it could have a batch size [2]. This is what will happen with a ParallelEnv for instance. All its specs will have a leading shape of [2, *], meaning that your full_composite_spec will have a shape of [2] and the leaf will have shape [2, 3, 64, 64].

Final case: your env has no batch size but it simulates several groups of agents (MARL setting). A first group named agents1 has 3 identical members and a second, agents2 has 4. The first outputs images from its steps and the second outputs a state vector of shape 5.

Here's how to build it:

full_observation_spec = CompositeSpec(
    agents1=CompositeSpec(pixels=SomeSpec(3, 3, 64, 64), shape=[3]),
    agents2=CompositeSpec(state=SomeOtherSpec(4, 5), shape=[4]), 
shape=[])

I hope that clarifies things a tiny bit