Open MSRA-COLT opened 4 years ago
Hi, I really appreciate your open source code. My question is how is your performance number reported in the paper.
For example, in Table 1, do you use the max evaluation return during the learning process or use the last evaluation return. The return of the policy has large variance in different iteration.
Thanks, Yue
I have the same problem, When i run the demo not in mixed, the results has large variance.
Also, which version of D4RL were you using (also in COMBO)? The reason why I ask is that the buffer quality is quite different in v0~v2. (you could refer to the TD3BC paper for details).
@HYDesmondLiu The config file in this repo says they used '-v0' dataset for MOPO. But I'm still curious about the dataset version used in COMBO, is COMBO's source code even released? I am also having trouble stabilizing MOPO's performance. The variance of performance across epochs is quite huge.
@typoverflow AFAIK, COMBO source code is not shared. As I recall they use D4RL v2 buffers since the performance between v0 and v2 is quite different. You could easily spot the difference. "Some" DRL methods are notorious for being unreproducible. You could refer to this paper and other related research for more information.
Hi, I really appreciate your open source code. My question is how is your performance number reported in the paper.
For example, in Table 1, do you use the max evaluation return during the learning process or use the last evaluation return. The return of the policy has large variance in different iteration.
Thanks, Yue