Closed ma7555 closed 4 years ago
Issue explained:
file networks.py
hardcodes batch_size into the correct_batch_indices function
https://github.com/tyagi-iiitv/PointPillars/blob/cc0c4be0ca0bdd481c809673305a69ef116b02c4/network.py#L27
This results into wrong dimensinality during ditributed training as batch_size is actually divided by number of GPUs or replicas during .fit()
I have been thinking for a while about changes in this function but nothing worked. This is what I tried
def correct_batch_indices(tensor):
seq = tf.range(tf.shape(tensor)[0])
array = tf.Variable(lambda: tf.zeros_like(tensor))
array = array[seq, :, 0].assign(seq)
return tf.math.add(tensor, array)
Using a tf.Variable inside a lambda is a bad idea, if you can suggest something better let me know
fixed for network.py
, will need to look at the generator tomorrow too
Using MirroredStrategy for distributed training results in an error