Open yaxinshen opened 6 years ago
@yaxinshen +1, I tried to imagine how to share weights in that way. How about introducing for _ in range(1152/36):
when tf.matmul
involving W
in routing
function? Other idea not losing vectorization?
EDIT
Oh, I found tf.scan
which is commented out was for the sharing weights. The author preferred tf.tile
to tf.scan
for performance!
EDIT2 This issue seems to be a duplication of previous issue, questions about the weight maxtrix Wij between ui and vj and it makes me clear.
@yaxinshen,
Version 1 (i.e., the computationally expensive approach) does have 8 distinct set of weights for each 6 x 6 x 32 tensor. This is what the paper does.
Version 2 technically has 1 distinct set of weights for the entire 6 x 6 x 256 block and then reshapes the output to the correct shape.
I don't know if this actually matters in practice => The network will eventually learn the correct weights, whether it's 8 or 1 distinct sets.
In paper, each capsule in the [6 × 6] grid is sharing their weights with each other and is your code miss this point?