I tried to replicate the tf impala by using pytorch but the performance of the repilicated version is bad - Githubissues

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

https://ray.io

Apache License 2.0

33.35k stars 5.65k forks source link

I tried to replicate the tf impala by using pytorch but the performance of the repilicated version is bad #5248

Closed sufeidechabei closed 4 years ago

sufeidechabei commented 5 years ago

I use same optimizer, same model architecture and same initialization, but the performance is bad.

ericl commented 5 years ago

Can you share the code? One thing to try to verify is that the gradients are identical given the same model weights and batch of experiences.

sufeidechabei commented 5 years ago

Here is the code implementation of the pytorch impala using ray. I use the same configuration with tensorflow. https://github.com/sufeidechabei/ray-pytorch-impala

sufeidechabei commented 5 years ago

@ericl

sufeidechabei commented 5 years ago

It has a gap with tensorflow when using "BreakoutNoFrameskip-v4" environment with default impala configuration , can you help me review the code, If possible, I want to contribute the pytorch impala to ray repo @ericl

sufeidechabei commented 5 years ago

I also tried to use same batch of input (initialize the input to be np.ones(batch, c, h, w), with same weights), the output and the gradient of pytorch impala and tensorflow impala are the same, but after 3 or more than 3 iterations, the output and gradient will be different. @ericl

ericl commented 5 years ago

If the gradient is the same, but the weights are different after the update, then could it be a different configuration of the gradient clipping or SGD optimizer?

sufeidechabei commented 5 years ago

The configurations (include the gradient clip) are the same, I use adam optimizer, I think the difference of pytorch adam optimizer and tensorflow adam optimizer can't cause the big gap in performance (episode reward mean and episode reward max ).

sufeidechabei commented 5 years ago

@ericl

ericl commented 5 years ago

How big is the gap? Also, to confirm you say the gradients are identical but the weights are not after the step? Do the weights diverge after 1 step immediately?

sufeidechabei commented 5 years ago

I also run the a3c pytorch (in order to verify it' caused by my policy other than the model, a3c pytorch and impala pytorch use the same model (VisonNet)) there are still existing performance gap in pytorch a3c and tensorflow a3c. I will show the figure@ericl

sufeidechabei commented 5 years ago

Screenshot from 2019-07-22 16-38-54

sufeidechabei commented 5 years ago

All of them use the default configuration . pt represents for pytorch while tf represents for tensorflow@ericl

sufeidechabei commented 5 years ago

When update >= 2 times, the output, the weight and the gradient of pytorch impala and tensorflow impala will be different.

sufeidechabei commented 5 years ago

@ericl

ericl commented 5 years ago

Can you try to figure out if it is the gradient or the weights that start diverging first? It is hard to believe they are exactly the same for 1 step and then different for the rest. The update should be fully deterministic right?

ericl commented 5 years ago

By the way by step I mean single Adam step, not end to end in rllib.

sufeidechabei commented 5 years ago

I have print the ouput of 1 step, they are the same, but after 1 step the output will be different.

sufeidechabei commented 5 years ago

@ericl

ericl commented 5 years ago

Ok, can you try to figure out why this is? Maybe start with the simple a3c version.

sufeidechabei commented 5 years ago

I'm not sure whether it's caused by the difference of pytorch model and tensorflow model, you can see my code, the model's architecture and initialization are the same with the tf version, but I don't know why it still has a gap in performance @ericl

sufeidechabei commented 5 years ago

By the way, I find there is no pytorch impala opensouced, I think it's an interesting project for the development of ray community. @ericl

ericl commented 5 years ago

@sufeidechabei I agree this would be a great contribution. I don't have time to try to debug this right now though, but would be happy to shepherd such a PR if you can figure out why the pytorch optimizer step doesn't produce the same model update as the TF version.

sufeidechabei commented 5 years ago

I check the code for many times, i still can't point out the difference between tf version and pt verison. @ericl