rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.43k stars 547 forks source link

AWAC doesn't profit from offline data #166

Open im-Kitsch opened 2 years ago

im-Kitsch commented 2 years ago

Hi,

@anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you.

In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset.

I run the code in repo examples/awac/mujoco/awac1.py with all default settings, seems pretraining on offline data doesn't help these experiments. I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.

Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.

Looking forward to your reply.

Best.

Winston-Gu commented 2 years ago

Met the same problem... In my case, i checked my result in "pretrain_q.csv", and found it seem like the offline_training procedure didn't actually happen... I'm looking closely into the source code, and i think maybe the default hyperparameters should be alterd.

Winston-Gu commented 2 years ago

This is my result for HalfCheetah, as you noted, "it learned nothing". While the result shown in the paper looks like this: I noticed that when creating the HalfCheetah-v2 environment, gym raised a warning indicating that HalfCheetah-v2 is outdated, is there any possibility that some changes in the environment caused this problem?

Roberto09 commented 2 years ago

Just wondering, is the general issue that after pretraining the average returns go to zero during the training phase? Or that the model learns nothing during pretraining (i.e. returns are always near 0 during the pretraining phase)?

linhlpv commented 1 year ago

Hi @Winston-Gu , it seems that my question is not related to the problem discussed in here but and I am sorry for that. But I'm trying to reproduce the AWAC results and stucking with creating the figures like showed in the AWAC paper. I see that you maybe could create the similar figures like in AWAC paper. Could you please help me with that? Thank you so much and wish you have a nice day.