Closed akanimax closed 6 years ago
I think the current implementation is equivalent to a matmul. Don't you? The main issue here is that CNTK does not currently support multidimensional matmul which would be more computationally efficiently.
I've originally tried to use the numpy matmul via a CNTK custom operator but is significantly slower on GPU than the current implementation. Although it is faster on CPU, I leaned towards the GPU power mainly due to the training time.
If you have any improvement suggestion don't hesitate to comment on it.
In the module CapsLayer.py, inside the function
DigitCaps(input, num_capsules, dim_out_vector, routings=3, name='DigitCaps'):
for computing the predictions given by the prev layer capsules 'u' for the subsequent layer 'u_hat_ij', the implementation does elementwise multiplication (hadamard product) and then performs a reduce. Aren't we supposed to project every 'u' using a linear matrix projection into 'u_hat_ij'?