Open rgraziosi-fbk opened 11 months ago
I also want to ask this question!
@rgraziosi-fbk Thanks for the issue. I assume that you want to mask actions at bootstrap target calculation. In that case, you need to modify this action
selection here:
https://github.com/takuseno/d3rlpy/blob/b4290f832cc7dccd4f95bbfe26e861d22c1b500f/d3rlpy/algos/qlearning/torch/dqn_impl.py#L125
This method is inherited up to DiscreteCQL
. If you change this method, DiscreteCQL
will be also modified.
Hi everyone!
I'm trying to implement action masking for the discrete CQL algorithm, i.e. I'd like to make some actions impossible to choose given some conditions on the current observation.
At inference time it should be easy, because
predict_value
can be used to get action values for every possible action, then the mask could be used to filter out the impossible actions, and finally argmax can be used to get the action to execute.However, I'm unsure on how to implement action masking during training. Is there any way to do that without changing d3rlpy source code? If not, could you please give me some hints about which parts of the codebase should be changed to achieve this?
Thank you a lot in advance!