Closed JinGuang-cuhksz closed 7 months ago
What happens when you remove the weight decay and mask input into adamw? Does the issue persist? I haven't seen this before.
score_model = TrainState.create(apply_fn=actor_def.apply, params=actor_params, tx=optax.adamw(learning_rate=3e-4)
Also, make sure to train for the full 3 million steps to get above 80 on walker2d-med (there is some variance, of course).
Btw, I just pushed some bug fixes that will speed up the code a fair amount during evaluation. One thing I wanted to do was choose the checkpoint for the BC actor that achieved the lowest validation loss.
I also removed the masking on the weights for weight decay since you had this bug,
Thank you so much for your kind and fast response. The value error disappears.
I will try to use your new code to train for the full 3 million steps.
Thanks for your suggestions, but I need help getting the desired results. I get the performance $78.17\pm 2.24$ (N=256) and $76.46\pm 0.82$ (N=64) for the walker2d-med task.
Could you please share your checkpoints so that I can get evaluation results similar to the papers'? Thank you so much.
These seem pretty inline with what I got. I made some small changes to the code I recently pushed to make it run faster, such as reducing the batch size down and adding layernorm to IQL to avoid some instability issues. This would probably explain the small 2-4 point deviation.
I don't have my old checkpoints unfortunately. If you really want to get past 80, you might want to revert the code to the prior commit and run it but it will take 2-3x runtime + evaluation.
Here were my learning curve for walker2d medium on N 64 and N 256 from the prior commit. I'm guessing my small changes reduced performance slightly.
Thanks for your kind and fast response. Your performance seems pretty good and stable. Since I need to get the checkpoints with good performance, I need to turn to your old version by increasing the batch size and deleting the layernorm and hope it works.
Hello. Do the line and shadow in your figures mean the average performance and the confidence interval of 10 runs, respectively?
I would appreciate your advice. I used the previous version's code and got an average above 80. But it is slightly worse than yours. Could you share the random seed, because I find your code doesn't specify the random seed?
Your results look good. After re-checking the appendix of my paper, I used N=128 for the locomotion results and N=32 for antmaze.
Yes, the shadow is standard deviation from my runs.
If the other results in other papers are within 2 standard deviation of my results. I mark both as bold. There is usually 4-10 points "interval". Feel free to bold however you like, just report how you do it.
I believe I selected totally random rngs, but I think you can fix it by setting the seed in the code to a fixed integer.
Feel free to report the results from your run instead of from the paper!
Oh, hang on. Did you run for "3 million steps?". You just need to input 1.5 million because the actor takes two gradient steps per critic gradient step. I just report the number of critic gradients. You can just take the results from the 1.5 million evaluation if you'd like.
OK. Thank you so much.
Hello, I met a value error when running the code directly. I extracted the relevant code in Section 1 and got the error in Section 2. It works after I add an extra code
actor_params = freeze(actor_params)
. However, the performance is less than 80 for the walker2d-med task. I'm not sure whether this additionactor_params = freeze(actor_params)
is correct. Could you help me? Thanks a lot.Code
Error
Env