Question about D4RL MuJoCo benchmark

microsoft / ATAC

Code accompanying the paper Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan Jiang, and Alekh Agarwal.

MIT License

69 stars 7 forks source link

Question about D4RL MuJoCo benchmark #2

Closed HYDesmondLiu closed 2 years ago

HYDesmondLiu commented 2 years ago

Thanks for sharing the codes. I have one question. It seems like you are using D4RL v2 (C.2.), and in Table 1 you mention that "the baseline results are from the respective papers". However, some previous papers were using D4RL v0. I believe the buffer quality is varied from v0 to v2 (see TD3BC paper). Thus, the comparison might be biased.

chinganc commented 2 years ago

Hi @HYDesmondLiu, Thanks for reaching out. This is indeed an important point, which could cause bias in comparison (For the adroit datasets, we used v0 so I think the comparison is fine there). We used v2 mujoco dataset in the experiments, because we were told by the d4rl authors that there were bugs in the v0 dataset and they suggested us to start with v2. That said, we will add experimental results of v0 in the revision. Please give us some time to produce the results. Thanks.

HYDesmondLiu commented 2 years ago

Hi @chinganc, thanks for noticing this. Could you share what are the bugs in D4RL v0? It would be quite useful.

chinganc commented 2 years ago

I don't know the specifics but only that there were (minor) bugs in v0 and v1. The d4rl github page mentions bugs of hopper. I would suggest you to reach out to the authors to learn more.

chinganc commented 2 years ago

Hi @HYDesmondLiu, we updated the paper. You can find the v0 results in Table 3 in Appendix.