jannerm / trajectory-transformer

Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"
https://trajectory-transformer.github.io
MIT License
464 stars 65 forks source link

Question about D4RL-gym dataset version #4

Open FineArtz opened 2 years ago

FineArtz commented 2 years ago

Hi, recently I read your paper and it inspire me a lot, and I think it is no doubt a good paper. However, I am confused about the version of D4RL dataset used for your compared baselines. I notice that in "Appendix C Baseline performance sources", the results of BC, MOPO (by the way, I didn't find MOPO in your experiment part) and MBOP are taken from their original papers, all of which use D4RL-gym-v0 datasets. Because I find that the performance of CQL on D4RL-gym-v0^[1] is greatly different from that on D4RL-gym-v2[2] on several datasets, I wonder that will scores of the above baselines change greatly on D4RL-gym-v2, or you have evidence that this will not happen, since you compare these scores directly?

jannerm commented 2 years ago

Nice catch!

BC on v2 performs 4.1 percentage points higher than on v0, with an average score of 51.8 versus 47.7 [1]. I'll update this in the next arXiv version.

I have reached out to the authors of MBOP to see if they can share code for reevaluation on the v2 datasets.