v4 vs v0 - Githubissues

talium0713 commented 2 years ago

Hi. I am trying to compare my proposed algorithms with 50M frames because of high computation. I found that the benchmark is set with 'v0' which uses sticky_actions (25% random). However, most of the papers report their score with 'v4', so I wonder if there is a big difference in performance between v0 and v4. If there is a benchmark in v4, it would be really appreciated if you could share it.

By the way, I'm a huge fan of your works and always following up on your DRL suggestions. Thanks.

psc-g commented 2 years ago

hi, thanks for your note! indeed, we use v0 by default as suggested by machado et al. https://arxiv.org/abs/1709.06009, as it is a more robust evaluation protocol. in our white paper https://arxiv.org/abs/1812.06110 we do compare v0 with v4 (see figure 6), and you can see there are fairly significant differences. unfortunately we don't have checkpoints for the v4 runs.

if computation is a concern, lots of recent papers have been evaluating on just 100K frames. this can be very noisy, but we provide some guidance into how to get more statistically significant results in our recent paper https://arxiv.org/abs/2108.13264.

hope this helps, and good luck!

-psc

On Sat, Nov 20, 2021 at 12:41 AM talium0713 @.***> wrote:

Hi. I am trying to compare my proposed algorithms with 50M frames because of high computation. I found that the benchmark is set with 'v0' which uses sticky_actions (25% random). However, most of the papers report their score with 'v4', so I wonder if there is a big difference in performance between v0 and v4. If there is a benchmark in v4, it would be really appreciated if you could share it.

By the way, I'm a huge fan of you and always following up on your DRL suggestions. Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMO4KKMVFUTNTA4N4KTUM4YHRANCNFSM5INSLMFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

talium0713 commented 2 years ago

Thanks for the good advice with great papers! Honestly, atari environment is a heavy experiment that requires too much computational resources for an individual researcher like me. On the other hand, many researchers seem to have a hard time as many reviewers require the overall results of the Atari games.

In that regard, the study on the evaluation methodology you proposed is likely to be a more important paper. I wonder if evaluating only on 100K is a reliable method, but I will definitely read your paper and try to understand it.

google / dopamine

v4 vs v0 #187