upb-lea / gym-electric-motor

Gym Electric Motor (GEM): An OpenAI Gym Environment for Electric Motors
https://upb-lea.github.io/gym-electric-motor/
MIT License
291 stars 65 forks source link

Merge reference with state #230

Open XyDrKRulof opened 6 months ago

XyDrKRulof commented 6 months ago

Currently, GEM returns a tuple containing of (state, reference) for each step and reset. This is incompatible with the standard RL library stable-baselines3 and might potentially be incompatible with other libraries. As, at least for reinforcement learning applications, it is always the case that state and reference have to be concatenated for the agent, both could be merged to one state. Additionally, for those interested in only core simulation, an option during initialization could be given to disable the reference generation (and thus reward calculation) to speed up the code.

XyDrKRulof commented 6 months ago

Additionally, the stable-baselines3 example code should be updated as it assumes a very outdated version of GEM.

RapheaSid commented 6 months ago

I am trying to run the stable-baselines3 example code and facing the same issue of observation space being a tuple. Can you kindly provide the edited code snippet because I am trying it since many days but could not resolve it.

XyDrKRulof commented 6 months ago

@RapheaSid in order to make the current GEM environment compatible to stable-baselines 3 you can apply a wrapper which changes the tuple into an array:

class ObservationFlatter(ObservationWrapper):

    def __init__(self, env):
        super(ObservationFlatter, self).__init__(env)
        state_space = self.env.observation_space[0]
        ref_space = self.env.observation_space[1]

        new_low = np.concatenate((state_space.low,
                                  ref_space.low))
        new_high = np.concatenate((state_space.high,
                                   ref_space.high))

        self.observation_space = Box(new_low, new_high)

    def observation(self, observation):
        observation = np.concatenate((observation[0],
                                      observation[1],
                                      ))
        return observation

After you have defined your environment you can easily wrap it into this wrapper:

env= FeatureWrapper(env)

I hope that helps. If you got any further questions, feel free to ask. The next GEM version will probably change the state/reference observation tuple to a flat array.