Closed sufeidechabei closed 4 years ago
Can you share the code? One thing to try to verify is that the gradients are identical given the same model weights and batch of experiences.
Here is the code implementation of the pytorch impala using ray. I use the same configuration with tensorflow. https://github.com/sufeidechabei/ray-pytorch-impala
@ericl
It has a gap with tensorflow when using "BreakoutNoFrameskip-v4" environment with default impala configuration , can you help me review the code, If possible, I want to contribute the pytorch impala to ray repo @ericl
I also tried to use same batch of input (initialize the input to be np.ones(batch, c, h, w), with same weights), the output and the gradient of pytorch impala and tensorflow impala are the same, but after 3 or more than 3 iterations, the output and gradient will be different. @ericl
If the gradient is the same, but the weights are different after the update, then could it be a different configuration of the gradient clipping or SGD optimizer?
The configurations (include the gradient clip) are the same, I use adam optimizer, I think the difference of pytorch adam optimizer and tensorflow adam optimizer can't cause the big gap in performance (episode reward mean and episode reward max ).
@ericl
How big is the gap? Also, to confirm you say the gradients are identical but the weights are not after the step? Do the weights diverge after 1 step immediately?
I also run the a3c pytorch (in order to verify it' caused by my policy other than the model, a3c pytorch and impala pytorch use the same model (VisonNet)) there are still existing performance gap in pytorch a3c and tensorflow a3c. I will show the figure@ericl
All of them use the default configuration . pt represents for pytorch while tf represents for tensorflow@ericl
When update >= 2 times, the output, the weight and the gradient of pytorch impala and tensorflow impala will be different.
@ericl
Can you try to figure out if it is the gradient or the weights that start diverging first? It is hard to believe they are exactly the same for 1 step and then different for the rest. The update should be fully deterministic right?
By the way by step I mean single Adam step, not end to end in rllib.
I have print the ouput of 1 step, they are the same, but after 1 step the output will be different.
@ericl
Ok, can you try to figure out why this is? Maybe start with the simple a3c version.
I'm not sure whether it's caused by the difference of pytorch model and tensorflow model, you can see my code, the model's architecture and initialization are the same with the tf version, but I don't know why it still has a gap in performance @ericl
By the way, I find there is no pytorch impala opensouced, I think it's an interesting project for the development of ray community. @ericl
@sufeidechabei I agree this would be a great contribution. I don't have time to try to debug this right now though, but would be happy to shepherd such a PR if you can figure out why the pytorch optimizer step doesn't produce the same model update as the TF version.
I check the code for many times, i still can't point out the difference between tf version and pt verison. @ericl
I use same optimizer, same model architecture and same initialization, but the performance is bad.