zaiyan-x / RFQI

Implementation of Robust Reinforcement Learning using Offline Data [NeurIPS'22]
MIT License
22 stars 3 forks source link

Ask about compute the target in robust bellman operator #1

Closed linhlpv closed 6 months ago

linhlpv commented 6 months ago

Hi @zaiyan-x,

Thank you about your amazing work. I have been using your code base for my project. I see that when you calculate the robust bellman target for updating Q function, you didn't multiple target_Q with not_done variable. I'm just curious about this choice. Was it an implementation bug or a specific choice? Here is the line I mentioned https://github.com/zaiyan-x/RFQI/blob/0e583723e00d2f7a19a0035d028d7566a54f1d04/rfqi.py#L305

Thank you again and have a nice day. Best, Linh

zaiyan-x commented 6 months ago

Hi Linh,

Thank you for your question, an interesting one :)

In spirit, yes, it will be better to include the not_done variable. In practice, the probability mass on the terminal states are negligible. This is because MuJoCo control tasks have very long horizon. If the underlying task is something like FrozenLake, then yes, this might create an issue. It was certainly not a design choice.

Best regards,

Zaiyan

linhlpv commented 6 months ago

Hi Zaiyan,

Thank you for your answer :D.