toshikwa / sac-discrete.pytorch

PyTorch implementation of SAC-Discrete.
MIT License
284 stars 35 forks source link

How can i change the environment #8

Closed yhisme closed 4 years ago

yhisme commented 4 years ago

If i change "parser.add_argument('--env_id', type=str, default='MsPacmanNoFrameskip-v4')"to another env just like parser.add_argument('--env_id', type=str, default='CartPole-v0').It will throw the exception " assert 'NoFrameskip' in env.spec.idAssertionError".

And how can i add my own env from the code ,thank you a lot if you can give me some help.

toshikwa commented 4 years ago

Hi, @yhisme

In env.py, I use the wrapper make_atari and wrap_deepmind_pytorch to use the same evaluation protocol as the DQN paper.

https://github.com/ku2482/sac-discrete.pytorch/blob/master/sacd/env.py#L268

If you want to use other envs (e.g.without NoFrameskip), please define your own wrapper function like them. Note that you always need WarpFramePyTorch to resize image to (84, 84) and should not use ScaledFloatFrame because my code assumes states are between [0, 255].

If you have questions about env wrappers in env.py, please ask me :)

Anyway, thank you for asking!!

yhisme commented 4 years ago

Thanks for your response thats very kind of you :) , can i ask you a question? In my env image , If i use DQN directly,it will only take line 1. now I want to generate multiple strategies to reach the goal (such as routes 2, 3 and the reward is the same) .Do you think if i use soft q learning or SAC,the agent will find the routes 2, 3? Thank you again :)

toshikwa commented 4 years ago

Hi. I think stochastic policies like SAC-Discrete can learn multiple strategies if they find them during exploration. So you may need to set start_steps large enough to find multiple paths during exploration.

Does it answer to your question?

yhisme commented 4 years ago

ok thank you i get it! I will try to transplant my girdenv to your code.

toshikwa commented 4 years ago

@yhisme

Hi, I forgot to notice you important notes.

First, maybe you also should care about target_entropy, because it controls exploration as well as start_steps.

Second, exploit() doesn't introduce stochasticity, which results in deterministic behaviour. So if you want the agent to act stochasticly in order to show multiple strategies, you need to use explore() or other stochastic policy.

Good luck for your work!!

yhisme commented 4 years ago

Hi,thanks for your notice,I will try your suggestion.Hope everything goes well for you :)