ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.54k stars 5.69k forks source link

[rllib] ModelV2: support TF2 keras models and multiple output heads #4905

Closed ericl closed 5 years ago

ericl commented 5 years ago

Describe the problem

The goals of ModelV2 are to:

TODO:

ericl commented 5 years ago

cc @hartikainen @bbalaji-ucsd

bbalaji-ucsd commented 5 years ago

Just listing requirements here to ensure I understand the proposal:

  1. Able to create separate heads and share common layers.
  2. Able to create separate heads and not share common layers.
  3. Able to create separate neural architectures for different functions: value, policy, Q, curiosity, etc.

Questions:

ericl commented 5 years ago

@bbalaji-ucsd thanks for raising these, here's an attempt to clarify:

listing requirements

I think these are exactly right.

Why is the Torch API so different from the TF API?

The main difference is the addition of input_dict, hidden_state, feature_layer arguments. I was originally thinking that since Torch is eager, it would be more natural to have these values directly available as method arguments, rather than having to have them saved internally in the class or something. For example, the usage of the model internally by RLlib would be something like:

action_out, feature_out, state = model(input_dict, state)
value_out = model.branch_forward(input_dict, state, feature_out, "value", 1)
curiosity_out = model.branch_forward(input_dict, state, feature_out, "curiosity", 10)

Though it may be better to simplify if the model retains tensors internally:

action_out, feature_out, state = model(input_dict, state)
value_out = model.branch_output("value", 1)
curiosity_out = model.branch_output("curiosity", 10)

One drawback of this approach is that if the model is used multiple times with different inputs (e.g., DQN), it could get confusing if state is stored in the model and could lead to bugs.

Not sure how to accomplish (3) from above. Do I just create separate architectures and ignore the 'default_share_layers'?

That's right, it is supposed to be just a hint. The custom model is free to ignore it and create entirely separate architectures internally.

bbalaji-ucsd commented 5 years ago

cc @saurabh3949

@ericl Agree with the overall design. Like the first option where the tensors are explicitly stated and not saved internally. I'm going over the implementation you have in https://github.com/ray-project/ray/pull/4926

A few comments and questions:

ericl commented 5 years ago

@bbalaji-ucsd agree on all points. I'll work these into the PR.

Will the new Model be one of the modules of the policy builder function?

Yep, added a make_model function (defaults to getting a model from the catalog).

ericl commented 5 years ago

One feedback was that branch_output could be split up into methods, and that subclasses could be used to organize these. For example for DQN:

TFModel()
    def forward()
    def value()

QModel(TFModel)
    def q_value()
    def action_value()

DuelingQModel(QModel)
    def state_value()

The main advantage is readability compared to having a single get_branch_output() method.

bbalaji-ucsd commented 5 years ago

Yes, agree branch output function with string arguments are not the best way. I like these separate class method much better, but I thought you wanted to have just one model class instance per policy. This design necessitates one model class per neural network. I think separate model classes approach is easier to understand and develop on.

ericl commented 5 years ago

Actually there is still a single model per policy -- it's just that the model class might be specialized to have extra methods. For example QModel has to implement q_value and action_value, and algorithms can require a certain type of custom model to be passed in by the user.

bbalaji-ucsd commented 5 years ago

Is there a way to support reuse of Model classes across agents using this? For example, DQN and PPO can re-use curiosity model, or, DQN and SAC can reuse Q model. I was thinking of multiple model classes to encourage this reuse.

Perhaps multiple inheritance is sufficient to handle the above with a single model instance per policy. What do you think?

ericl commented 5 years ago

Something like multiple inheritance makes sense to me (I think of it as implementing multiple interfaces) . So if you want to use curiosity, your model should implement some extra methods (or we could provide a default fallback).

ericl commented 5 years ago

The ModelV2 PR is now ready for review.

bentzinir commented 4 years ago

@ericl, can you please clarify if variable reuse is currently possible? According to this, variable reuse is based on TensorFlow's reuse API via variable_scope functionality. However, to my knowledge, Keras' reuse style is based on the "functional API" idea which is different (and as far as I can tell, all model implementations in rllib are written using Keras). So if I want to reuse the base_model between policy_0 and policy_1 I need to physically pass the Keras model that was created for policy_0 and calling it again inside policy_1?

p.s. Running the multiagent_cartpole script (framework=tf) I don't see that shared variables are indeed shared.

ericl commented 4 years ago

Yes, you need to pass that exact model. This is a little tricky but can be done with global variables in Python (put the base model in a global and access it from multiple models).

bentzinir commented 3 years ago

Not directly related to the topic (I can open a new thread if you prefer).

Returning to the multiagent cartpole example. The number of policies is set according to num_policies argument:

policies = { "policy_{}".format(i): gen_policy(i) for i in range(args.num_policies) }

while the number of agents is set via the num_agents argument:

"env_config": { "num_agents": args.num_agents, },

Assume a setup of 1:1 correspondence between the number of policies and the number of agents.

How would you grid search the number of policies?

Put differently, I want to set the number of agents by tune.grid_search. How can I make the num_agents and the num_policies to look at the same value when using tune.grid_search?