kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
469 stars 124 forks source link

Dimension is not matching #1

Closed jeong-tae closed 6 years ago

jeong-tae commented 6 years ago

Currently i am working on making a RAM with pytorch. i found your code and following now.

https://github.com/kevinzakka/recurrent-visual-attention/blob/master/modules.py#L164 in this line, the phi is 4D tensor if given image is color, otherwise 3D Whatever it should be 2D tensor to apply linear operation.

isn't it a typo? or missing a reshape?

kevinzakka commented 6 years ago

Hey @jeong-tae, my repo is still unfinished. It's missing full support for batched images (i.e. 4D tensors) and I still have not coded the REINFORCE algorithm. I made it public because I was hoping to get help with a bounding box issue that was occurring.

duygusar commented 6 years ago

I suppose this was a mistake of mine to start with as I don't have experience with pytorch. @kevinzakka thank you for sharing your code but could you add a note of known issues or not yet supported parts. I have been trying to make it run on my custom RGB dataset to no avail. If/When I have more experience with pytorch I can work it around but it gave me the impression that this is a complete implementation of the paper so I was convinced it was my dataloader.

kevinzakka commented 6 years ago

@duygusar I fail to see how my README conveys the impression that this is a fully implemented version of the paper. There is an unfinished TODO list and the Results section clearly states that only the 28x28 MNIST was tackled. On top of that, the authors of the paper only worked with 28x28 and 60x60 square images. Strictly speaking then, any deviation from that would entail not being a "full-implementation" of the paper.

If I understand correctly, you want me to add a note in the README mentioning that rectangular images may throw a runtime error?

duygusar commented 6 years ago

First of all,I did say it was my mistake not to go in full detail of the code, I normally would but I am new to pytorch, I understand this is purely volunteer based and thank you for sharing this. But no, you don't understand me correctly, and obviously I am not the only person who thought it is a full implementation as this and this: https://github.com/kevinzakka/recurrent-visual-attention/issues/13 issue suggests. Adding those issues could potentially help people be aware (just like the random sampling part), I don't understand why this should be a touchy matter especially when people are confused about performance and GPU support (still open issues btw). And no, it is not just a problem of using a "rectangular" image, yes I did open an issue about that (I assume you are referring to that) and have been trying to fix it but it is not fixed which might be due to the unsupported batch or color channel as well -seriously though??-