ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.95k stars 5.58k forks source link

About model configuration. #7644

Open dxu23nc opened 4 years ago

dxu23nc commented 4 years ago

I noticed that there are two model configuration parameters: fcnet_hiddens in model_config and hiddens in the configuration of dqn.py. I try to print the whole model (model.base_model.summary()), it shows that the "hiddens" do not show up. Why did that happen? I know this "hiddens" is used in function build_q_model dqn.py. The code is as follows: ` import ray from ray import tune import functools import pickle import threading

from ray.rllib.agents import dqn ray.init() config= dqn.DEFAULT_CONFIG.copy() config["num_workers"] = 1 config["hiddens"] = [512,128] trainer = dqn.DQNTrainer(config=config,env="AirRaid-ram-v0") model = trainer.get_policy().model model.base_model.summary() model.q_value_head.summary() `

And the log is: Model: "model"


Layer (type) Output Shape Param Connected to

observations (InputLayer) [(None, 128)] 0


fc_1 (Dense) (None, 256) 33024 observations[0][0]


fc_out (Dense) (None, 256) 65792 fc_1[0][0]


value_out (Dense) (None, 1) 257 fc_1[0][0]

Total params: 99,073 Trainable params: 99,073 Non-trainable params: 0


Model: "model_1"


Layer (type) Output Shape Param model_out (InputLayer) [(None, 256)] 0


lambda (Lambda) [(None, 6), (None, 6, 1), 198022 Total params: 198,022 Trainable params: 198,022 Non-trainable params: 0


rfali commented 2 years ago

@ericl @richardliaw It seems that the above issue has gone away with recent ray versions, not sure what resolved it.

ray version 1.3.0, tensorflow 2.34.1

If I run the above code


import ray
from ray.rllib.agents import dqn

ray.init()
config= dqn.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["hiddens"] = [512,128]
trainer = dqn.DQNTrainer(config=config,env="AirRaid-ram-v0")
model = trainer.get_policy().model
model.base_model.summary()
model.q_value_head.summary()

I get the following output

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observations (InputLayer)       [(None, 128)]        0                                            
__________________________________________________________________________________________________
fc_1 (Dense)                    (None, 256)          33024       observations[0][0]               
__________________________________________________________________________________________________
fc_out (Dense)                  (None, 256)          65792       fc_1[0][0]                       
__________________________________________________________________________________________________
value_out (Dense)               (None, 1)            257         fc_1[0][0]                       
==================================================================================================
Total params: 99,073
Trainable params: 99,073
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
model_out (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
hidden_0 (Dense)                (None, 512)          131584      model_out[0][0]                  
__________________________________________________________________________________________________
hidden_1 (Dense)                (None, 128)          65664       hidden_0[0][0]                   
__________________________________________________________________________________________________
dense (Dense)                   (None, 6)            774         hidden_1[0][0]                   
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 6)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 6)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 6, 1)]       0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 6, 1)]       0           tf_op_layer_default_policy/ones_l
==================================================================================================
Total params: 198,022
Trainable params: 198,022
Non-trainable params: 0

which seems accurate to me. The above output corresponds to

[base_model] obs_space_size ->  2 dense layers of 256 each (last being called fc_out)  
[q_head] fc_out -> 2 q_hidden layers [512, 128] -> Q_values of action_space_size  

i.e.
[128 observations] -> [256] -> [256] -> [512] -> [128] -> [6 actions]

This is because the default value of fcnet_hiddens is [256,256]. The last few layers in the q_head correspond to the layers if distributional Q-learning (C-51) is used.

The q_hiddens or basically config["hiddens"] (this inconsistency should be addressed) is the number of dense layers after the Advantage-Value split, as documented in the DistributionalQTFModel.

If you set Dueling to True


import ray
from ray.rllib.agents import dqn

ray.init()
config= dqn.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["hiddens"] = [512,128]
config["dueling"] = True

trainer = dqn.DQNTrainer(config=config,env="AirRaid-ram-v0")
model = trainer.get_policy().model
model.base_model.summary()
model.q_value_head.summary()
model.state_value_head.summary()

we get

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observations (InputLayer)       [(None, 128)]        0                                            
__________________________________________________________________________________________________
fc_1 (Dense)                    (None, 256)          33024       observations[0][0]               
__________________________________________________________________________________________________
fc_out (Dense)                  (None, 256)          65792       fc_1[0][0]                       
__________________________________________________________________________________________________
value_out (Dense)               (None, 1)            257         fc_1[0][0]                       
==================================================================================================
Total params: 99,073
Trainable params: 99,073
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
model_out (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
hidden_0 (Dense)                (None, 512)          131584      model_out[0][0]                  
__________________________________________________________________________________________________
hidden_1 (Dense)                (None, 128)          65664       hidden_0[0][0]                   
__________________________________________________________________________________________________
dense (Dense)                   (None, 6)            774         hidden_1[0][0]                   
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 6)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 6)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 6, 1)]       0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 6, 1)]       0           tf_op_layer_default_policy/ones_l
==================================================================================================
Total params: 198,022
Trainable params: 198,022
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
model_out (InputLayer)       [(None, 256)]             0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               131584    
_________________________________________________________________
dense_2 (Dense)              (None, 128)               65664     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 129       
=================================================================
Total params: 197,377
Trainable params: 197,377
Non-trainable params: 0

which corresponds to

shared layer
[128 observations] -> [256] -> [256] (this splits into following two layers)
-> [512] -> [128] -> [6 Q values for the 6 possible actions]
-> [512] -> [128] -> [1 value]            

The last [256] is the common layer, and splits into two branches, one each for Advantage and Value streams, as described in the Dueling DQN paper.

rfali commented 2 years ago

However, I have stumbled into a corner case, which could be a lack of understanding on my part. If you set hiddens to be an empty list, and set dueling True, then the V(s) is used from the state_value_head here. This makes for an interesting case since this value is not connected to the last shared dense layer, but actually the Q_head, which would be wrong. Here is a reproduction code.


# CartPole with q-hiddens ON and OFF

import ray
from ray.rllib.agents import dqn

ray.init()
config= dqn.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["model"]["fcnet_hiddens"] = [64, 64]
config["hiddens"] = []  #[128, 128]
config["dueling"] = True

trainer = dqn.DQNTrainer(config=config,env="CartPole-v0")
model = trainer.get_policy().model
model.base_model.summary()
model.q_value_head.summary()
model.state_value_head.summary()

with config["hiddens"] = [], we have the following output

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observations (InputLayer)       [(None, 4)]          0                                            
__________________________________________________________________________________________________
fc_1 (Dense)                    (None, 64)           320         observations[0][0]               
__________________________________________________________________________________________________
fc_2 (Dense)                    (None, 64)           4160        fc_1[0][0]                       
__________________________________________________________________________________________________
fc_out (Dense)                  (None, 2)            130         fc_2[0][0]                       
__________________________________________________________________________________________________
value_out (Dense)               (None, 1)            65          fc_2[0][0]                       
==================================================================================================
Total params: 4,675
Trainable params: 4,675
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
model_out (InputLayer)          [(None, 2)]          0                                            
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           model_out[0][0]                  
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           model_out[0][0]                  
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 2)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 2)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 2, 1)]       0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 2, 1)]       0           tf_op_layer_default_policy/ones_l
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
model_out (InputLayer)       [(None, 2)]               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 3         
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________

The part in model_2 where the state_value_head (the last layer named dense) is (supposedly) connected to the fc_out layer of shape (None, 2) i.e. the mode_out of the base model) instead of being connected to fc_2 (Dense) of shape (None, 64) is what is bothering me.

Note that the value_out layer in the base model (first model in the above output) i.e. value_out (Dense) of shape (None, 1) and size 65 would not be used here since dueling is set to True, and instead of the state_value_head model output would be used (as can be seen in the dqn_tfpolicy code here

Just to present the alternate here as well, we get the following output with hiddens set to [128, 128]

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observations (InputLayer)       [(None, 4)]          0                                            
__________________________________________________________________________________________________
fc_1 (Dense)                    (None, 64)           320         observations[0][0]               
__________________________________________________________________________________________________
fc_out (Dense)                  (None, 64)           4160        fc_1[0][0]                       
__________________________________________________________________________________________________
value_out (Dense)               (None, 1)            65          fc_1[0][0]                       
==================================================================================================
Total params: 4,545
Trainable params: 4,545
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
model_out (InputLayer)          [(None, 64)]         0                                            
__________________________________________________________________________________________________
hidden_0 (Dense)                (None, 128)          8320        model_out[0][0]                  
__________________________________________________________________________________________________
hidden_1 (Dense)                (None, 128)          16512       hidden_0[0][0]                   
__________________________________________________________________________________________________
dense (Dense)                   (None, 2)            258         hidden_1[0][0]                   
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 2)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 2)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 2, 1)]       0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 2, 1)]       0           tf_op_layer_default_policy/ones_l
==================================================================================================
Total params: 25,090
Trainable params: 25,090
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
model_out (InputLayer)       [(None, 64)]              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               8320      
_________________________________________________________________
dense_2 (Dense)              (None, 128)               16512     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 129       
=================================================================
Total params: 24,961
Trainable params: 24,961
Non-trainable params: 0

Of course this won't happen with the default value of hiddens set already to 256 which enforces a dense layer after the last fcnet_hidden layer, a detail which is obscured for a beginner.

The problem is that with the default DQN settings (dueing is set to True, hiddens set to [256] whereas fcnet_hiddens is also [256,256] in model.catalog), the model becomes [obs_space_size] -> [256] -> [256] -> [256] -> Q-values

Or for that matter, when a user specifies fcnet_hiddens = [64] as in the cartpole dqn tuned example, what they expect is a [obs_space_size] -> [64] -> Q-values but they get [obs_space_size] -> [64] -> [256] -> Q-values If my understanding of the rllib architecture is correct, this is not what the user actually asked for (or at least thought of). Even if you set dueling to False, you will get the same model architecture as above. The following is a reproduction code


import ray
from ray.rllib.agents import dqn

ray.init()
config= dqn.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["model"]["fcnet_hiddens"] = [64]
config["dueling"] = False

trainer = dqn.DQNTrainer(config=config,env="CartPole-v0")
model = trainer.get_policy().model
model.base_model.summary()
model.q_value_head.summary()

Output

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observations (InputLayer)       [(None, 4)]          0                                            
__________________________________________________________________________________________________
fc_out (Dense)                  (None, 64)           320         observations[0][0]               
__________________________________________________________________________________________________
value_out (Dense)               (None, 1)            5           observations[0][0]               
==================================================================================================
Total params: 325
Trainable params: 325
Non-trainable params: 0
__________________________________________________________________________________________________
Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
model_out (InputLayer)          [(None, 64)]         0                                            
__________________________________________________________________________________________________
hidden_0 (Dense)                (None, 256)          16640       model_out[0][0]                  
__________________________________________________________________________________________________
dense (Dense)                   (None, 2)            514         hidden_0[0][0]                   
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(2,)]               0           dense[0][0]                      
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 2)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/ones [(None, 2)]          0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 2, 1)]       0           tf_op_layer_default_policy/ones_l
__________________________________________________________________________________________________
tf_op_layer_default_policy/Expa [(None, 2, 1)]       0           tf_op_layer_default_policy/ones_l
==================================================================================================
Total params: 17,154
Trainable params: 17,154
Non-trainable params: 0

One solution could be that If the hiddens is only meant for the dueling architecture, it could be used only if the dueling is also set to True. AFAIK, this is not the case, as can be seen from the above output.

Since this is a design level question, I am going to tag @sven1977 and @ericl to see if they can clarify. Thank you very much for reading through this. I appreciate your help in furthering my understanding.

rfali commented 2 years ago

Can this be related (although above question is not about VisionNets)? PR#18306

built-in VisionNets: The value branch - in case we have a shared one - is attached to the action-logits output instead of the feature output (one layer before the action logits output).

vedhasam commented 1 year ago

There definitely appears to be some design inconsistency in how the models are ultimately created, when config["hiddens"]=[] or when config["hiddens"]=<something non-empty>.

If:

observations (InputLayer) =[(None, 16)]
config["model"]["fcnet_hiddens"] = [20, 17, 10]
num_actions=3

config["hiddens"]=[] results in

base_model: [Nx16]->[Nx20]->[Nx17]->[Nx10]-> (which splits into [Nx1] and [Nx3])
q_value_head: [Nx3] -> (splits into two of [Nx3x1] )
state_value_head: [Nx3] -> [Nx1]

whereas config["hiddens"]=[256] results in

base_model: [Nx16]->[Nx20]->[Nx17] -> (splits into [Nx1] and into [Nx10]) 
q_value_head: [Nx10]->[Nx256]->(splits into two of [Nx3x1] )
state_value_head: [Nx10]->[Nx256]->[Nx1]

With everything, especially theconfig["model"]["fcnet_hiddens"] identical, why should state_value_head input size be [Nx3[ in one while [Nx10] in another? From the consistency point of view, for state_value_head EITHER A. [Nx3]->[Nx1] (with config["hiddens"]=[]) implies that [Nx3]->[Nx256]->[Nx1] (with config["hiddens"]=[256]) OR B. [Nx10]->[Nx256]->[Nx1] (with config["hiddens"]=[256]) implies that [Nx10]->[Nx1] (with config["hiddens"]=[])

Similar arguments could be made for q_value_head for base_model in terms of the size of the input they use and produce respectively.

Most critically, the value_head of base_model (with config["hiddens"]=[256]) uses the 10-sized fc_net_hiddens[-1], while in the other base_model bypassees it completely to compute value_head.

vedhasam commented 1 year ago

Here's how the two models (hiddens=[] v/s hiddens=[256]) look like pictorially, with all other hyperparameters identical otherwise. image

Please correct me if I am wrong, but if one was to assume the 3 constituent models obtained with hiddens=[256] to be the right implementation, then with hiddens=[], one would only expect that only the dense layer with 256 nodes gets deleted, with no change in the interface between the 3 constituent models. If you see the interfacing though (i.e., output of base_model = input of state_value_head = input of q_value_head=) the interface size changes from 10 to 3 for hiddens=[256] and hiddens=[] respectively. Wouldn't it be easier if fcnet_hiddens layers were to dedicatedly compute value_out and advantages/q consistently? On the other hand, if we were to say that fc_out computes features and not advantages or q, wouldn't it be easier if the last layer of fcnet_hiddens (of size 10) was to dedicatedly compute the features consistently in both case (hiddens =[256] v/s []), for the consistency sake?

karstenddwx commented 7 months ago

Hi, any comments to what vedhasam described?