Closed yhisme closed 4 years ago
Hi, @yhisme
In env.py, I use the wrapper make_atari
and wrap_deepmind_pytorch
to use the same evaluation protocol as the DQN paper.
https://github.com/ku2482/sac-discrete.pytorch/blob/master/sacd/env.py#L268
If you want to use other envs (e.g.without NoFrameskip
), please define your own wrapper function like them.
Note that you always need WarpFramePyTorch
to resize image to (84, 84) and should not use ScaledFloatFrame
because my code assumes states are between [0, 255].
If you have questions about env wrappers in env.py
, please ask me :)
Anyway, thank you for asking!!
Thanks for your response thats very kind of you :) , can i ask you a question? In my env , If i use DQN directly,it will only take line 1. now I want to generate multiple strategies to reach the goal (such as routes 2, 3 and the reward is the same) .Do you think if i use soft q learning or SAC,the agent will find the routes 2, 3? Thank you again :)
Hi.
I think stochastic policies like SAC-Discrete can learn multiple strategies if they find them during exploration. So you may need to set start_steps
large enough to find multiple paths during exploration.
Does it answer to your question?
ok thank you i get it! I will try to transplant my girdenv to your code.
@yhisme
Hi, I forgot to notice you important notes.
First, maybe you also should care about target_entropy, because it controls exploration as well as start_steps.
Second, exploit()
doesn't introduce stochasticity, which results in deterministic behaviour.
So if you want the agent to act stochasticly in order to show multiple strategies, you need to use explore()
or other stochastic policy.
Good luck for your work!!
Hi,thanks for your notice,I will try your suggestion.Hope everything goes well for you :)
If i change "parser.add_argument('--env_id', type=str, default='MsPacmanNoFrameskip-v4')"to another env just like parser.add_argument('--env_id', type=str, default='CartPole-v0').It will throw the exception " assert 'NoFrameskip' in env.spec.idAssertionError".
And how can i add my own env from the code ,thank you a lot if you can give me some help.