I'm curious that how to deal with the MDP with continuous action space.

ml-jku / rudder

RUDDER: Return Decomposition for Delayed Rewards

46 stars 7 forks source link

I'm curious that how to deal with the MDP with continuous action space. #3

Open Summer142857 opened 3 years ago

Summer142857 commented 3 years ago

It seems that the situations with continuous action space are not illuminated in the code.

widmi commented 3 years ago

Hi! No, the implementation doesn't explicitly cover continuous actions spaces. You may apply RUDDER to continuous action spaces by modifying the input to the reward redistribution LSTM: instead of discrete actions as input features, use continuous actions as input features for the LSTM.