Closed pshvechikov closed 7 years ago
Sorry for such a delay. First, thanks for reporting the difficulties with code reading. I've added a small tip that should help reading the code as well as several bigger clarifications.
The simpler version of what's policy estimators is "if you need something when you write loss function, you throw it into policy estimators".
It is indeed thrown away in get_react_function
since you don't need to return q-values when you merely want to take action. The get_agent_reaction
method is indeed used only once every time you actually want to interact with any env. The catch is, it is required every time you want to do something outside gym.
The real usage for
By the way what do you think is a more intuitive name for the parameter?
If this parameter is used for getting output for some (not necessarily
policy) layers maybe it should be called layers_to_track
or smthing
similar.
More to say, I don't really think the tip to first study recurrence.py is really helpful.
Sorry for any inconveniences
On Sun, Apr 2, 2017, 23:41 justheuristic notifications@github.com wrote:
Sorry for such a delay. First, thanks for reporting the difficulties with code reading. I've added a small tip https://github.com/yandexdataschool/AgentNet/blob/master/agentnet/agent/mdp_agent.py#L35-L36 that should help reading the code as well as several bigger clarifications.
The simpler version of what's policy estimators is "if you need something when you write loss function, you throw it into policy estimators".
It is indeed thrown away in get_react_function since you don't need to return q-values when you merely want to take action. The get_agent_reaction method is indeed used only once every time you actually want to interact with any env. The catch is, it is required every time you want to do something outside gym.
The real usage for
By the way what do you think is a more intuitive name for the parameter?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yandexdataschool/AgentNet/issues/93#issuecomment-291013509, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXiYQ05i2ibfH9EAuV10dIWsiO4P-Y8ks5rsAf-gaJpZM4Mdjwc .
I added some clarifications in both Agent and Recurrence. In recurrence it's called tracked_outputs
.
Will migrate Agent there through the regular deprecation procedure.
policy_estimators
is really weird in both documentation and usage.Documentation is very unclear about it (given that
action_layers
is a separate parameter) :policy_estimators (lasagne.Layer child instance (e.g. Q-values) or a tuple of such instances (e.g. state value + action probabilities for a2c)) – whatever determines agent policy
. This explains almost nothing of how to use it and what should be passed to it.Code is even more strange. The most important part of
Agent
isget_react_function
. It usesself.policy
(savedpolicy_estimators
instance), returning the symbolic computation ofself.policy
layer in a third return value (along with duplicatedself.action_layers
– i am not sure i understand why to duplicate action_layers). Howeverget_react_function
is used only once across AgentNet's codebase. And this exclusive usage ofself.get_agent_reaction
completely throws off the computedself.policy
(namedapplier_policy
in respective scope)So the evolution of
policy_estimators
is as follows:policy_estimators
becomesself.policy
self.policy
is symbolically computed and assigned tonew_outputs
new_outputs
, returning from aget_agent_reaction
, becomesapplier_policy
applier_policy
is completely ignoredThus the only useful place which truly uses
self.policy
is parameter getters:But the rationale of keeping additional unclear parameter to
Agent
which arguably increases clutter is somewhat questionable given it is used mainly for purpose of returning it's parameters and for tracking purposes inself.as_recurrence
. Do we really need it?