[QUESTION] How to add an action mask to DiscreteCQL algorithm?

rgraziosi-fbk commented 11 months ago

Hi everyone!

I'm trying to implement action masking for the discrete CQL algorithm, i.e. I'd like to make some actions impossible to choose given some conditions on the current observation.

At inference time it should be easy, because predict_value can be used to get action values for every possible action, then the mask could be used to filter out the impossible actions, and finally argmax can be used to get the action to execute.

However, I'm unsure on how to implement action masking during training. Is there any way to do that without changing d3rlpy source code? If not, could you please give me some hints about which parts of the codebase should be changed to achieve this?

Thank you a lot in advance!

Lucien-Evans-123 commented 11 months ago

I also want to ask this question!

takuseno commented 11 months ago

@rgraziosi-fbk Thanks for the issue. I assume that you want to mask actions at bootstrap target calculation. In that case, you need to modify this action selection here: https://github.com/takuseno/d3rlpy/blob/b4290f832cc7dccd4f95bbfe26e861d22c1b500f/d3rlpy/algos/qlearning/torch/dqn_impl.py#L125

This method is inherited up to DiscreteCQL. If you change this method, DiscreteCQL will be also modified.

takuseno / d3rlpy

[QUESTION] How to add an action mask to DiscreteCQL algorithm? #356