naturomics / CapsNet-Tensorflow

A Tensorflow implementation of CapsNet(Capsules Net) in paper Dynamic Routing Between Capsules
Apache License 2.0
3.8k stars 1.17k forks source link

questions about the weight maxtrix Wij between ui and vj #9

Closed jianyin2016 closed 6 years ago

jianyin2016 commented 6 years ago

Firstly.thanks for your answer on zhihu as well as the implementation on github, it helps me a lot understanding the original paper.

I would like to share my doubt about the very lines just below the figure 2 of the original paper which says "each capsule in the [6,6] grid is sharing their weights with each other".which by my understanding ,means capsule outputs(vector ui) among a [6,6] grid shares the same Wij.thus,just 32 W should be updated using adam.but in your implementation ,I can't find any codes to handle the weights sharing mechanism.

Besides,I think the shape of Wij should be [16,8] as the ui is [1,8] or [8,1] vector and obviously conflicts with the Eq 2 .although it looks like a problem without any importance,I pick it out so that i would be righted if i am wrong with understanding this paper and your implementation.

rohit-mehra commented 6 years ago

@jianyin2016 I had the same doubt, and this statement(below) in the paper is kind of increasing the simplicity, but also hindering the proper understanding of what exactly is a capsule unit:

1. Wij is a weight matrix between each u_i, where i belongs to (1, 32x6x6) in PrimaryCapsules and v_j, where j belongs to (1, 10).

2. Wij =[8x16] is also mentioned in the diagram.

It would be great if someone could explain this part in the context of a capsule layer and capsule units.

jianyin2016 commented 6 years ago

@rrqq hello,man.

I studied a couple of implementations these days and I found what I previously think is wrong.

Actually,I tend to believe the ”sharing weights“ mentioned above means each capsule in the [6,6] share a same set of filters to get the [1,8] vector,generally speaking,it is a common sense in DL what sharing weight means and it make sense.

This issue should be closed as thought I have doubts in all the implementations released because they seem to be not very sure of their implementations themselves.