tsinghua-rll / VoxelNet-tensorflow

A 3D object detection system for autonomous driving.
MIT License
453 stars 123 forks source link

Runtime issue #3

Closed ZiningWang closed 6 years ago

ZiningWang commented 6 years ago

I believe there is one difference in implementation against the original paper. The VFE is not done by extracting the non-zero points. You should do FCN with all [K,T,7] tensor. If you use map_fn, it is very likely that the network run slowly.

ring00 commented 6 years ago

Due to the limit of time, we didn't implement the method mentioned in section 2.3 of the original paper. But we have done similar things during data preprocessing. Therefore, only non-empty voxels are fed into the network.

As for map_fn, since the number of non-empty voxels varys from one point cloud to another, we decided to just loop over all non-empty voxels. Of course, there are faster ways, but we were running out of time ~for a course project~ when writing the code.

ZiningWang commented 6 years ago

I was not asking for your first answer but for the second. The runtime of your VFE layer is about 5s on my TITAN X. If you would like to update your code I think this is one part you should work on. Something like below would reduced the VFE time to 19ms.

`class VFELayer(object):

def __init__(self, out_channels, name):
    super(VFELayer, self).__init__()
    self.units = out_channels / 2
    self.out_channels = out_channels
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE) as scope:
        self.dense = tf.layers.Dense(
            self.units, tf.nn.relu, name='dense', _reuse=tf.AUTO_REUSE, _scope=scope)
        self.batch_norm = tf.layers.BatchNormalization(
            name='batch_norm', fused=True, _reuse=tf.AUTO_REUSE, _scope=scope)

def apply(self, inputs, training):
    pointwise = tf.reshape(self.batch_norm.apply(self.dense.apply(inputs), training),[-1,cfg.VOXEL_POINT_COUNT, self.units])

    aggregated = tf.reduce_max(pointwise, axis=1, keep_dims=True)

    repeated = tf.tile(aggregated, [1, cfg.VOXEL_POINT_COUNT, 1])

    concatenated = tf.reshape(tf.concat([pointwise, repeated], axis=2),[-1,self.out_channels])

    return concatenated`
ring00 commented 6 years ago

Correct me if I am wrong. The input of VFELayer is now a [K, T, 7] tensor, and we can call apply once on all voxels without using map_fn on individual voxels.

@jeasinema I think we should change the VFELayer as suggested by @ZiningWang. Also, the paper says that,

Note that, after concatenation operations in VFE, we reset the features corresponding to empty points to zero such that they do not affect the computed voxel features.

Any idea on how to do it in an elegant way?

Maybe we can do it with a boolean mask, I mean something like

mask = tf.not_equal(inputs, tf.constant(0.0))
mask = tf.tile(mask, [1, 1, 2])
concatenated = tf.multiply(concatenated, tf.cast(mask, tf.float32))

However, this is not completely correct, because the feature vector of a non-empty point could contain 0 as well.

Another option is ignoring this part and praying to our TITANs. And hopefully, the network will learn to ignore empty points itself.

ZiningWang commented 6 years ago

One suggestion is to use the scatter_nd.