kkhetarpal / ioc

Options of Interest: Temporal Abstraction with Interest Functions AAAI 2020
https://sites.google.com/view/optionsofinterest
25 stars 2 forks source link

Confusion about comparative trials #3

Open IDayday opened 2 years ago

IDayday commented 2 years ago

Thanks for your good work!I can‘t appreciate it more about your well code. But I am a little confused about the details of the comparison test (IOC with OC). In the tabular test, I find there are two ways to calculate advantage function。 https://github.com/kkhetarpal/ioc/blob/fb88d5b881e0b5c317020b23874495e614b3ddc7/tabular/interestoptioncritic_tabular_fr.py#L261 It is not same with the OC. Which like https://github.com/kkhetarpal/ioc/blob/fb88d5b881e0b5c317020b23874495e614b3ddc7/tabular/optioncritic_tabular_fr.py#L141 I try to calculate it using the same way (like IOC). And I find that OC can be as well as IOC in the tabular test, without your Interest Function. Can you explain why it would be like this ?

kkhetarpal commented 2 years ago

Hi, Thanks for reporting this. So we kept the baseline as it was used in OC (which is using the max). But we believe that the more correct version is the one that appears in our IOC code, where the advantage is really Q - V.

It is indeed interesting that if you use the IOC way of doing things i.e. the advantage being Q-V you get the same results for both OC and IOC. We are curious if you meant that the results remain identical by as well as? Besides, did you also do a sweep over the learning rate to make sure this conclusion is robust?

Adding my collaborator @mklissa for visibility.