thomashirtz / gym-battleship

Battleship environment for reinforcement learning tasks
MIT License
10 stars 2 forks source link

np.int32 vs np.float32 observations #6

Open ronTohar1 opened 1 year ago

ronTohar1 commented 1 year ago

hey, I noticed there is a problem in battleships.py where the observation space is of type int and the returned observation in reset or step are of type np.float32. This causes a problem with env_checker of gym of course and therefore also with the use of RL libraries such as SB3 as they use this information and it creates conflicts.

thomashirtz commented 1 year ago

I just committed some changes, please let me know if it has resolved the issue, if not, please give me a minimal minimal reproducible example of the issue :)

ronTohar1 commented 1 year ago

It did fix my problem, but I did have to do some changes to the step function (regardless of the current new commit) to make the code work.

I changed the step() function from

def step(self, raw_action: Union[int, tuple]) -> Tuple[np.ndarray, int, bool, dict]:
        if isinstance(raw_action, int):

To

def step(self, raw_action: Union[int, tuple]) -> Tuple[np.ndarray, int, bool, dict]:
        if isinstance(raw_action, int) or isinstance(raw_action, np.int32) or isinstance(raw_action, np.int64):

This change had been made way before the aforementioned problem arose.

For example, the gym environment checker didn't pass before the change to the step() function, as well as the following piece of code:

import gymnasium
import gym_battleship

env = gymnasium.make("GymV21Environment-v0", env_id="Battleship-v0")
obs,_ = env.reset()
done = False
while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
    env.render() 

This happened because for some reason, sample() of the action apce gave an numpy.int64 value. (This error is also the reason the env_checker didn't 'approve' the environment)

P.S I just can't understand why you'd put an np.float32 in the dtype as it is only integers, but I guess it doesn't make any difference.

thomashirtz commented 1 year ago

Ok, I'll take a closer look for fixing the code when I have time

Please let me know if there are any other thing not so right that you notice in the code :)

For the observation space, I remember that there were a clear reason why I did that and I was quite upset to have to switch to float instead of int. However I can't remember now. I may switch back to int and see if someone experience an issue.

Thank you for your message :)