Closed JunhyeonPark closed 6 years ago
Hi! To answer your question:
1) Use sparse_scatter during training; use sparse_scatter_var during inference. The reason is, tensorflow C++ API seems to have a limitation on modifying tensors in-place so this creates an unnecessary copy of the entire base tensor reducing speedup opportunities.
2) Another issue with the snippet above is that convert_mask_to_indices_custom is not the right function to use with sbnet_module.sparse_gather/sparse_scatter APIs. Those functions accept an index list in a format produced by sbnet_module.reduce_mask().
3) reduce_mask currently doesn't have a gradient implementation, so the mask has to be precomputed before the training portion at this time with a stop_gradient in between the mask subnetwork and the trainable subnetwork. This is a limitation that needs to be addressed on our end.
Thank you for nice explanation, I finally train classification model with sbnet_module.
When the input tensor is large, the speed of sbnet_module ConvNet is faster than dense ConvNet.
However, when the size of input tensor is small like MNIST classifcation, the speed was slower than dense ConvNet. May I ask if this is the natural result and why?
Under a certain tensor size the GPU utilization will be less than 100% (meaning there's not enough total computation to fill the entire GPU with work) so there will be no speedup from reducing computation. Also because scatter and gather kernels are not currently fused with convolutions, they introduce a fixed overhead. So there's a break-even point where this fixed overhead doesn't trade off for any saved computation. Generally without fusion it is expected but even with fused kernel there's some overhead from reading the mask, so there's going to be some fixed overhead. This is also the reason why this approach does better for ResNet blocks than for single convolutions as we show in benchmarks in the paper.
I try to train model with MNIST dataset using sbnet_module, but
LookupError: No gradient defined for operation 'conv2/SparseScatterVar' (op type: SparseScatterVar)
How can I update gradient using sbnet_module? I don't know how to use @ops.RegsiterGradient("SparseGather") and @ops.RegsiterGradient("SparseScatter") Below is a sbnet_module conv2d function for training.