Open FineArtz opened 2 years ago
Nice catch!
BC on v2 performs 4.1 percentage points higher than on v0, with an average score of 51.8 versus 47.7 [1]. I'll update this in the next arXiv version.
I have reached out to the authors of MBOP to see if they can share code for reevaluation on the v2 datasets.
Hi, recently I read your paper and it inspire me a lot, and I think it is no doubt a good paper. However, I am confused about the version of D4RL dataset used for your compared baselines. I notice that in "Appendix C Baseline performance sources", the results of BC, MOPO (by the way, I didn't find MOPO in your experiment part) and MBOP are taken from their original papers, all of which use D4RL-gym-v0 datasets. Because I find that the performance of CQL on D4RL-gym-v0^[1] is greatly different from that on D4RL-gym-v2[2] on several datasets, I wonder that will scores of the above baselines change greatly on D4RL-gym-v2, or you have evidence that this will not happen, since you compare these scores directly?