uber-research / sbnet

Sparse Blocks Networks
Other
436 stars 91 forks source link

How can I train the model with sbnet_module? #4

Closed JunhyeonPark closed 6 years ago

JunhyeonPark commented 6 years ago

I try to train model with MNIST dataset using sbnet_module, but

LookupError: No gradient defined for operation 'conv2/SparseScatterVar' (op type: SparseScatterVar)

How can I update gradient using sbnet_module? I don't know how to use @ops.RegsiterGradient("SparseGather") and @ops.RegsiterGradient("SparseScatter") Below is a sbnet_module conv2d function for training.

from sparse_conv_lib import calc_block_params, convert_mask_to_indices_custom

def sparse_conv2d(x, W, hw):
    xsize_ = [batch, hw, hw, 1]

    mask = generate_top_left_mask(xsize_, 0.90)
    block_params = calc_block_params(xsize_,
                                     bsize_,
                                     ksize_,
                                     strides,
                                     padding='VALID')
    ind = convert_mask_to_indices_custom(mask, block_params, 0.0, True)
    x_ = tf.Variable(x)
    p = sbnet_module.sparse_gather(
        x_, 
        ind.bin_counts,
        ind.active_block_indices,
        bsize=block_params.bsize,
        boffset=block_params.boffset,
        bstride=block_params.bstrides,
        transpose=True)
    q = tf.nn.conv2d(p, W, strides, 'VALID', data_format='NCHW', use_cudnn_on_gpu=True)
    y = sbnet_module.sparse_scatter_var(
        q,
        ind.bin_counts,
        ind.active_block_indices,
        x_,
        bsize=block_params.bsize_out,
        boffset=[0, 0],
        bstride=block_params.bstrides,
        add=False,
        transpose=True,
        atomic=False)
    return y
andrei-pokrovsky commented 6 years ago

Hi! To answer your question:

1) Use sparse_scatter during training; use sparse_scatter_var during inference. The reason is, tensorflow C++ API seems to have a limitation on modifying tensors in-place so this creates an unnecessary copy of the entire base tensor reducing speedup opportunities.

2) Another issue with the snippet above is that convert_mask_to_indices_custom is not the right function to use with sbnet_module.sparse_gather/sparse_scatter APIs. Those functions accept an index list in a format produced by sbnet_module.reduce_mask().

3) reduce_mask currently doesn't have a gradient implementation, so the mask has to be precomputed before the training portion at this time with a stop_gradient in between the mask subnetwork and the trainable subnetwork. This is a limitation that needs to be addressed on our end.

JunhyeonPark commented 6 years ago

Thank you for nice explanation, I finally train classification model with sbnet_module.

When the input tensor is large, the speed of sbnet_module ConvNet is faster than dense ConvNet.

However, when the size of input tensor is small like MNIST classifcation, the speed was slower than dense ConvNet. May I ask if this is the natural result and why?

andrei-pokrovsky commented 6 years ago

Under a certain tensor size the GPU utilization will be less than 100% (meaning there's not enough total computation to fill the entire GPU with work) so there will be no speedup from reducing computation. Also because scatter and gather kernels are not currently fused with convolutions, they introduce a fixed overhead. So there's a break-even point where this fixed overhead doesn't trade off for any saved computation. Generally without fusion it is expected but even with fused kernel there's some overhead from reading the mask, so there's going to be some fixed overhead. This is also the reason why this approach does better for ResNet blocks than for single convolutions as we show in benchmarks in the paper.