www0wwwjs1 / Matrix-Capsules-EM-Tensorflow

A Tensorflow implementation of CapsNet based on paper Matrix Capsules with EM Routing
Apache License 2.0
218 stars 83 forks source link

Formation of Pose matrix and then votes #25

Open maomran opened 6 years ago

maomran commented 6 years ago

Hi, I would like to know (1) the intuition behind the pose matrix, how is it formulated for each capsule based on the Relu ofmaps ? (2) I am trying to evaluate the CapNet expensive operations, is the reshaping after each output is necessary for the next stage? Thanks,

maomran commented 6 years ago

Profile: node name | requested bytes | total execution time | accelerator execution time | cpu execution time Tile 10.19MB (100.00%, 25.86%), 50.61ms (100.00%, 36.02%), 0us (0.00%, 0.00%), 50.61ms (100.00%, 36.02%) Mul 15.69MB (74.14%, 39.82%), 28.38ms (63.98%, 20.20%), 0us (0.00%, 0.00%), 28.38ms (63.98%, 20.20%) Conv2D 131.71KB (34.32%, 0.33%), 16.82ms (43.78%, 11.98%), 0us (0.00%, 0.00%), 16.82ms (43.78%, 11.98%) Sub 5.11MB (33.99%, 12.95%), 15.30ms (31.81%, 10.89%), 0us (0.00%, 0.00%), 15.30ms (31.81%, 10.89%) RealDiv 0B (0.00%, 0.00%), 9.09ms (20.92%, 6.47%), 0us (0.00%, 0.00%), 9.09ms (20.92%, 6.47%) Transpose 332.93KB (21.03%, 0.84%), 5.78ms (14.45%, 4.11%), 0us (0.00%, 0.00%), 5.78ms (14.45%, 4.11%) BatchMatMul 5.10MB (20.19%, 12.93%), 5.58ms (10.34%, 3.97%), 0us (0.00%, 0.00%), 5.58ms (10.34%, 3.97%) Sum 627.26KB (7.26%, 1.59%), 2.29ms (6.36%, 1.63%), 0us (0.00%, 0.00%), 2.29ms (6.36%, 1.63%) DepthwiseConv2dNative 597.31KB (5.67%, 1.52%), 1.67ms (4.74%, 1.19%), 0us (0.00%, 0.00%), 1.67ms (4.74%, 1.19%) StridedSlice 332.93KB (4.15%, 0.84%), 950us (3.55%, 0.68%), 0us (0.00%, 0.00%), 950us (3.55%, 0.68%) QueueDequeueManyV2 4.10KB (3.31%, 0.01%), 887us (2.87%, 0.63%), 0us (0.00%, 0.00%), 887us (2.87%, 0.63%) Square 0B (0.00%, 0.00%), 518us (2.24%, 0.37%), 0us (0.00%, 0.00%), 518us (2.24%, 0.37%) Add 6.98KB (3.29%, 0.02%), 326us (1.87%, 0.23%), 0us (0.00%, 0.00%), 326us (1.87%, 0.23%) Max 20.61KB (3.28%, 0.05%), 261us (1.64%, 0.19%), 0us (0.00%, 0.00%), 261us (1.64%, 0.19%) ConcatV2 237.97KB (3.22%, 0.60%), 250us (1.45%, 0.18%), 0us (0.00%, 0.00%), 250us (1.45%, 0.18%) VariableV2 250.15KB (2.62%, 0.63%), 244us (1.27%, 0.17%), 0us (0.00%, 0.00%), 244us (1.27%, 0.17%) Softmax 1.34KB (1.99%, 0.00%), 237us (1.10%, 0.17%), 0us (0.00%, 0.00%), 237us (1.10%, 0.17%) Exp 0B (0.00%, 0.00%), 204us (0.93%, 0.15%), 0us (0.00%, 0.00%), 204us (0.93%, 0.15%) Reshape 0B (0.00%, 0.00%), 166us (0.79%, 0.12%), 0us (0.00%, 0.00%), 166us (0.79%, 0.12%) FusedBatchNorm 4.11KB (1.98%, 0.01%), 163us (0.67%, 0.12%), 0us (0.00%, 0.00%), 163us (0.67%, 0.12%) Const 461.60KB (1.97%, 1.17%), 155us (0.55%, 0.11%), 0us (0.00%, 0.00%), 155us (0.55%, 0.11%) BiasAdd 0B (0.00%, 0.00%), 120us (0.44%, 0.09%), 0us (0.00%, 0.00%), 120us (0.44%, 0.09%) Log 4B (0.80%, 0.00%), 117us (0.36%, 0.08%), 0us (0.00%, 0.00%), 117us (0.36%, 0.08%) Sqrt 53.25KB (0.80%, 0.14%), 90us (0.27%, 0.06%), 0us (0.00%, 0.00%), 90us (0.27%, 0.06%) ArgMax 8B (0.67%, 0.00%), 53us (0.21%, 0.04%), 0us (0.00%, 0.00%), 53us (0.21%, 0.04%) Identity 0B (0.00%, 0.00%), 50us (0.17%, 0.04%), 0us (0.00%, 0.00%), 50us (0.17%, 0.04%) RandomShuffleQueueV2 262.44KB (0.67%, 0.67%), 43us (0.14%, 0.03%), 0us (0.00%, 0.00%), 43us (0.14%, 0.03%) Neg 0B (0.00%, 0.00%), 33us (0.10%, 0.02%), 0us (0.00%, 0.00%), 33us (0.10%, 0.02%) Sigmoid 0B (0.00%, 0.00%), 24us (0.08%, 0.02%), 0us (0.00%, 0.00%), 24us (0.08%, 0.02%) ScalarSummary 8B (0.00%, 0.00%), 16us (0.06%, 0.01%), 0us (0.00%, 0.00%), 16us (0.06%, 0.01%) Cast 8B (0.00%, 0.00%), 15us (0.05%, 0.01%), 0us (0.00%, 0.00%), 15us (0.05%, 0.01%) MergeSummary 8B (0.00%, 0.00%), 15us (0.04%, 0.01%), 0us (0.00%, 0.00%), 15us (0.04%, 0.01%) Equal 1B (0.00%, 0.00%), 15us (0.03%, 0.01%), 0us (0.00%, 0.00%), 15us (0.03%, 0.01%) AvgPool 20B (0.00%, 0.00%), 14us (0.02%, 0.01%), 0us (0.00%, 0.00%), 14us (0.02%, 0.01%) Relu 0B (0.00%, 0.00%), 10us (0.01%, 0.01%), 0us (0.00%, 0.00%), 10us (0.01%, 0.01%) _retval_truediv_0_1 0B (0.00%, 0.00%), 3us (0.00%, 0.00%), 0us (0.00%, 0.00%), 3us (0.00%, 0.00%) ConstantFolding/truediv_recip 4B (0.00%, 0.00%), 2us (0.00%, 0.00%), 0us (0.00%, 0.00%), 2us (0.00%, 0.00

The profiling I did says that tiling takes most of the time, since it is done on the transformation matrix, is it really necessary to do it?