What's the difference between MOPO-no_penalty and MBPO

Hi, thanks so much for your nice work and code. I have a question about the results in Table 3, specifically, the MOPO-no_pen and MBPO. Could you clarify the difference between these two experiments? Noticed that, on page 19, the paper mentioned: "For simplicity, we use MBPO, which essentially MOPO without reward penalty, for this ablation study", so what causes the performance gap in Table 3?

tianheyu927 / mopo

What's the difference between MOPO-no_penalty and MBPO #6