tianheyu927 / mopo

Code for MOPO: Model-based Offline Policy Optimization
MIT License
171 stars 42 forks source link

What's the difference between MOPO-no_penalty and MBPO #6

Open zhaoyi11 opened 3 years ago

zhaoyi11 commented 3 years ago

Hi, thanks so much for your nice work and code. I have a question about the results in Table 3, specifically, the MOPO-no_pen and MBPO. Could you clarify the difference between these two experiments? Noticed that, on page 19, the paper mentioned: "For simplicity, we use MBPO, which essentially MOPO without reward penalty, for this ablation study", so what causes the performance gap in Table 3?