Closed wildsky95 closed 1 year ago
Hi, is there any development on this issue ? @avnishn @sven1977
Hey sorry about the late responses -- I haven't gotten to this in time -- I'm going to assign @sven1977 as well, so that we can get your issue resolved ASAP.
Hi, is there any update on this? @avnishn @sven1977
i cant quite understand why the build_q_model is concatenating the action and obs.
In RL literature, the Q function's parameters are states and actions. When we represent that as a neural network in code, we concatenate the observations and actions together in order to represent that the Q function is a function of these 2 things.
In the case that your environment is a discrete environment, the q function actually isn't a q function, and is instead a value function, which only takes as a parameter observation.
so my best guess here is that your issue has something to do with the observations and actions not being concatenated when they're being passed to the _get_q_value
function of the RNNSAC torch model. This is probably because at some point, self.concat_obs_and_actions
is being set to false, meaning that its trying to pass only the observation, instead of the observation concatenated with the action to your q function.
Hey,
I've run into the same issue.
From the debugging I've done it seems to be the calls toget_q_values
in action_distribution_fn
don't have actions passed in, so _get_q_value
doesn't concatenate anything with the observation and the dimension mismatch occurs.
_, q_state_out = model.get_q_values(model_out, states_in["q"], seq_lens)
if model.twin_q_net:
_, twin_q_state_out = model.get_twin_q_values(
model_out, states_in["twin_q"], seq_lens
)
With #23814 and passing in input_dict['actions']
it progresses further but I'm seeing other seemingly unrelated issues. I also have no idea if those are the actions expected at this point in the algo.
It appears the rnnsac implementation hasn't been tested with continuous actions, would be good if someone knowledgable of how its supposed to be could take a look, I've seen great performance with the torch implementation of the normal SAC so far.
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
``Hi, im trying to train a multiagent RNNsac with my custom environment. but the problem is i get a shape mismatch error, i tried to resolve this on my own. but i get that when building the q_model the obs_shape and action space gets concatenated and therefore the model shape gets a shape of action shape + ob shape, and in training the shape mismatch occurs. i cant quite understand why the build_q_model is concatenating the action and obs.
my custom env's observation space is (9640,) and action space is (4031,) and both are continous values, so with concatenation in q model building i get a shape error. im literally trying the RNNSAC test algorithm to run the model. and also it's worth mentioning that works perfectly well with multiagent cartpole but it doesn't work with custom env. ofcourse i tested my custom multi agent env with PPO and PG and its works good!!! the error i get is :
i use this code to train :
i dont quite understand this part of building q_model method :
thanks in advance for your guidance.
Versions / Dependencies
v2.0, v1.9