robfiras / ls-iq

Code of the paper "LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning" & LocoMuJoCo Baselines
MIT License
40 stars 4 forks source link

Absorbing states handling when updating policy #2

Closed mw9385 closed 1 year ago

mw9385 commented 1 year ago

Hi, thanks for your wonderful paper. I have been also trying with absorbing states in IQ-learn also but I don't know why it works well. My question is that is it fine to update policy using absorbing state samples? When I refer the paper "Discriminator actor critic (DAC)", they didn't update samples with absorbing states.

In my personal experience with original IQ-Learn approach using absorbing states, I always suffering from Q-function divergence issue. In order to solve this, I just used a very small learning rate both for actor and critic.

Also, if I want to try with LSIQ in your paper, is the iq_like_loss right one?

Many thanks.

robfiras commented 1 year ago

Hi, thanks for you interest in LSIQ! When you talk about samples and absorbing states, I guess you mean transitions that lead to absorbing states. The answer is that these samples are generally used, but the target value for the next states is then set to 0, which is the core problem of the absorbing state handling in IQ-Learn. If I remember correctly, DAC has an indicator flag within the state, which determines whether a state is absorbing or not. Then, they learn specific values for the absorbing states. In LSIQ, we do a similar thing. Instead of setting the value of an absorbing state to 0 (as IQ-Learn), we explicitly learn the value of an absorbing state. The value of an absorbing state depends on whether an absorbing state happens under the policy or the expert (more on that in the paper).

For LSIQ, you can use the iq_like_loss, or the sqil_like_loss, whereas we found that the sqil_like_loss works better with absorbing states. I will close this issue as it is not generally related to the code, but is rather on the theory of the paper. If you have more questions, feel free to send me an email (fi.alhafez@gmail.com). I would be happy to help!