Closed ericl closed 5 years ago
cc @hartikainen @bbalaji-ucsd
Just listing requirements here to ensure I understand the proposal:
Questions:
@bbalaji-ucsd thanks for raising these, here's an attempt to clarify:
listing requirements
I think these are exactly right.
Why is the Torch API so different from the TF API?
The main difference is the addition of input_dict, hidden_state, feature_layer
arguments. I was originally thinking that since Torch is eager, it would be more natural to have these values directly available as method arguments, rather than having to have them saved internally in the class or something. For example, the usage of the model internally by RLlib would be something like:
action_out, feature_out, state = model(input_dict, state)
value_out = model.branch_forward(input_dict, state, feature_out, "value", 1)
curiosity_out = model.branch_forward(input_dict, state, feature_out, "curiosity", 10)
Though it may be better to simplify if the model retains tensors internally:
action_out, feature_out, state = model(input_dict, state)
value_out = model.branch_output("value", 1)
curiosity_out = model.branch_output("curiosity", 10)
One drawback of this approach is that if the model is used multiple times with different inputs (e.g., DQN), it could get confusing if state is stored in the model and could lead to bugs.
Not sure how to accomplish (3) from above. Do I just create separate architectures and ignore the 'default_share_layers'?
That's right, it is supposed to be just a hint. The custom model is free to ignore it and create entirely separate architectures internally.
cc @saurabh3949
@ericl Agree with the overall design. Like the first option where the tensors are explicitly stated and not saved internally. I'm going over the implementation you have in https://github.com/ray-project/ray/pull/4926
A few comments and questions:
@bbalaji-ucsd agree on all points. I'll work these into the PR.
Will the new Model be one of the modules of the policy builder function?
Yep, added a make_model
function (defaults to getting a model from the catalog).
One feedback was that branch_output could be split up into methods, and that subclasses could be used to organize these. For example for DQN:
TFModel()
def forward()
def value()
QModel(TFModel)
def q_value()
def action_value()
DuelingQModel(QModel)
def state_value()
The main advantage is readability compared to having a single get_branch_output() method.
Yes, agree branch output function with string arguments are not the best way. I like these separate class method much better, but I thought you wanted to have just one model class instance per policy. This design necessitates one model class per neural network. I think separate model classes approach is easier to understand and develop on.
Actually there is still a single model per policy -- it's just that the model class might be specialized to have extra methods. For example QModel has to implement q_value
and action_value
, and algorithms can require a certain type of custom model to be passed in by the user.
Is there a way to support reuse of Model classes across agents using this? For example, DQN and PPO can re-use curiosity model, or, DQN and SAC can reuse Q model. I was thinking of multiple model classes to encourage this reuse.
Perhaps multiple inheritance is sufficient to handle the above with a single model instance per policy. What do you think?
Something like multiple inheritance makes sense to me (I think of it as implementing multiple interfaces) . So if you want to use curiosity, your model should implement some extra methods (or we could provide a default fallback).
The ModelV2 PR is now ready for review.
@ericl, can you please clarify if variable reuse is currently possible?
According to this, variable reuse is based on TensorFlow's reuse API via variable_scope
functionality. However, to my knowledge, Keras' reuse style is based on the "functional API" idea which is different (and as far as I can tell, all model implementations in rllib are written using Keras).
So if I want to reuse the base_model
between policy_0
and policy_1
I need to physically pass the Keras model that was created for policy_0
and calling it again inside policy_1
?
p.s. Running the multiagent_cartpole script (framework=tf
) I don't see that shared variables are indeed shared.
Yes, you need to pass that exact model. This is a little tricky but can be done with global variables in Python (put the base model in a global and access it from multiple models).
Not directly related to the topic (I can open a new thread if you prefer).
Returning to the multiagent cartpole example. The number of policies is set according to num_policies
argument:
policies = { "policy_{}".format(i): gen_policy(i) for i in range(args.num_policies) }
while the number of agents is set via the num_agents
argument:
"env_config": { "num_agents": args.num_agents, },
Assume a setup of 1:1 correspondence between the number of policies and the number of agents.
How would you grid search the number of policies?
Put differently, I want to set the number of agents by tune.grid_search
. How can I make the num_agents
and the num_policies
to look at the same value when using tune.grid_search
?
Describe the problem
The goals of ModelV2 are to:
support TF2 / keras-style reuse of variables (https://github.com/ray-project/ray/issues/4134)
provide a common structure between TF and Torch style models
support customizing multiple output heads or independent networks within a single model class. This will be done by having subclasses of ModelV2 such as QValueModel, CuriosityModel, etc. A single user custom model can mixin/implement as many of these optional subclasses as they need. For example, a custom DQN model should implement QValueModel.
be backwards compatible with ModelV1 through a wrapper interface.
TODO: