vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.26k stars 602 forks source link

Point offline RL users to tinkoff-ai/CORL #288

Closed vwxyzjn closed 1 year ago

vwxyzjn commented 1 year ago

Description

https://github.com/tinkoff-ai/CORL seems to be a great offline RL library that shares a similar design philosophy with us. Since we are not really developing offline RL algorithms, we should give a friendly pointer to them, especially given that they have done the same for us:

image

Types of changes

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Oct 5, 2022 at 8:36PM (UTC)
kinalmehta commented 1 year ago

Just a note: I went through their reports and codes, and there is a significant performance gap between their scores and the paper's reported scores (CQL and DT are the ones I specifically checked). Though they are very transparent about that and have warnings in the readme and code files for which the performance does not match.

On a positive note, it is a good reference for anyone trying to understand paper through code.

vwxyzjn commented 1 year ago

Merging now. Cc @Howuhh @Scitator @vkurenkov

Howuhh commented 1 year ago

@vwxyzjn, @kinalmehta Hi! Thank you for your time! Yeah, we know about underperformance for DT and CQL.

I'm currently working on fixing the DT. It turned out to be a very small bug, because I concatenated the sequences incorrectly: https://github.com/tinkoff-ai/CORL/blob/9069495537c9e22fb6a0c8abe7925f2bfa9b8a81/algorithms/dt.py#L312 After fixing, we are getting pretty close scores, so we just need to rerun all datasets, which will take a bit of time.

CQL is a separate story, a magical algorithm that works differently for everyone (judging by the results of various publications), even original implementation does not give same results as in paper. We also work on rewriting it from scratch again. However, we would be happy to see the contributors who knows CQL better than us.

All other algorithms are on par with original paper results. Our hope is to make this library also a set of simple baselines, not just educational in terms of code.

Howuhh commented 1 year ago

We also are already seeing interesting results. For example, we see that EDAC is much better than all previous algorithms (and simpler). Nevertheless, almost nobody mentions it as a baseline in new papers.

kinalmehta commented 1 year ago

Thank you for the update @Howuhh. It is indeed awesome work and the rigour with which you are evaluating the implementations is commendable.

It is good to know about EDAC's performance.