AssertionError: The observation returned by the `step()` method does not match the given observation space Discrete(2)

Hi,

I set a customized environment with a discrete type observation space. E.g., self.obervation_space = Dict({'A': Discrete(2), 'B': Discrete(3)}) In my action space, each value can be either +1, 0, or -1. If the next state is not contained within the designated observation space, I will apply a penalty of -10 as the reward. But when I use check_env function to check my environment, there is an error "AssertionError: Error while checking key=A: The observation returned by the step() method does not match the given observation space Discrete(2)". I know the problem occurs when the next state exceeds the existing designated observation space, but I would like to use the reward function to penalize the agent in such cases. So how do I solve the problem? In other words, how do I avoid the scenario when the next state is not contained in the space?

Thank you very much.

hill-a / stable-baselines

AssertionError: The observation returned by the `step()` method does not match the given observation space Discrete(2) #1194