Open theRealSuperMario opened 4 years ago
@theRealSuperMario just to clarify is the issue suspected to be due to the fact that the background is being set to -mean/std? In addition to updating the tutorial to include cluttered MNIST?
We could work on it together if you are still interested in working on this issue. I would be happy to take your input and run tests together.
Hi,
I am not actively pursuing this issue anymore, so I’ll have to leave it to the rest of you to decide what to do with this. Sorry about this.
Am 02.06.2023 um 04:14 schrieb zabboud @.***>:
@theRealSuperMario https://github.com/theRealSuperMario just to clarify is the issue suspected to be due to the fact that the background is being set to -mean/std? In addition to updating the tutorial to include cluttered MNIST?
We could work on it together if you are still interested in working on this issue. I would be happy to take your input and run tests together.
— Reply to this email directly, view it on GitHub https://github.com/pytorch/tutorials/issues/1117#issuecomment-1573038842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRXFWNLT2BSZQ3N6ABHQE3XJFD6RANCNFSM4P26D64A. You are receiving this because you were mentioned.
/assigntome
This issue has been unassigned due to inactivity. If you are still planning to work on this, you can still send a PR referencing this issue.
/assigntome
Hi,
I am opening this issue because I noticed a weird behavior of the the spatial transformer networks implementation (https://github.com/pytorch/tutorials/blob/78e91c54dd0cd4fb0d02dfcc86fe94d16ab03df6/intermediate_source/spatial_transformer_tutorial.py#L57)
I summarized my findings here. In short, what is happening is that when the input is normalised and then fed to the STN, the
F.grid_sample
call adds a zero-padding, however, the normalisation changes the background value from0
to-mean/std
. (https://github.com/pytorch/tutorials/blob/78e91c54dd0cd4fb0d02dfcc86fe94d16ab03df6/intermediate_source/spatial_transformer_tutorial.py#L127)This causes the STN to collapse very early and to actually never learn the correct transformation. You can actually see that in the example code already (https://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html), because the learnt transformation is zooming OUT instead of zooming IN on the digits. For the original 28 x 28 images, this is not such a big problem, However, when you continue to cluttered MNIST as in the original publication, the difference is huge. Once again, please have a look here.
I think the tutorial for the STN should be updated and also include the cluttered MNIST example because that is what drives the point home. I would volunteer to do so, if I get the permission to go ahead.
Unfortunately, most other implementations I was able to find on the web also have this bug.
cc @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen