cuda convnet2 insisting on #filters to be divisible by 32

The following assert exists in cuda-convent2: https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/img_acts.cu#L1208

This causes failure in some cases that cuda-convnet2 should support, for example:

model = nn.Sequential()
model:add(ccn2.SpatialConvolutionLocal(16, 16, 63, 9))
model:add(nn.ReLU())
model:backward(torch.rand(16,63,63,128))

will fail, because the number of filters is 16 which doesn't pass the assert check. However, the documentation here mentions that this number should be a multiple of 16.

Is this check really required?
In some cases, such as this, the flow of the code uses a filter cache size of 16: https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/img_acts.cu#L1345 does this mean that removing the assert will make the code above work correctly?

soumith / cuda-convnet2.torch

cuda convnet2 insisting on #filters to be divisible by 32 #17