Reproducing the results

mit-han-lab / spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

http://spvnas.mit.edu/

MIT License

587 stars 109 forks source link

Reproducing the results #40

Closed nuneslu closed 3 years ago

nuneslu commented 3 years ago

Hi I have been working with your models (withou NAS) in my own framework and I have been struggling to reproduce your results with the MinkUNet. I would like just to confirm some hyperparameters to see if I'm missing something.

So for the best test so far:

I used your scheduler, the cosine_warmup
learning_rate: 0.24
decay_lr: 1.0e-4
SGD optimizer with momentum 0.9 and nesterov=True
For the criterion I have used mix lovasz and cross-entropy which really increased the performance during my training
And epoch = 20

My main question is, in your paper it's said that after the first 15 epochs with lr = 0.24 a second training with other 15 epochs and lr = 0.096 is done. This second training is only for the NAS version or for the bare models it's also done a second 15 epochs training?

I'm asking because after the 20 epochs of training on my framework I've got only 38% of mIoU in the validation set (seq 08) and ~50% mIoU on the training set.

nuneslu commented 3 years ago

One more question that I've forgot. On the spvcnn model before the classifier module, the voxel_to_point method is called, which means that the classifier runs over the points not the voxels. However on the minkunet model the voxel_to_point method is not called, so I assume that for the minkunet the classifier runs over the voxels, and thereat, the loss is compute over the voxels and not over the points, is that correct?

nuneslu commented 3 years ago

A quick update, I could basically arrive at close results, but I've noticed some "overfitting". For the training the motorcyclist class achieves 90% mIoU, but for the validation it's 0%. Do you have any insights?

kentang-mit commented 3 years ago

One more question that I've forgot. On the spvcnn model before the classifier module, the voxel_to_point method is called, which means that the classifier runs over the points not the voxels. However on the minkunet model the voxel_to_point method is not called, so I assume that for the minkunet the classifier runs over the voxels, and thereat, the loss is compute over the voxels and not over the points, is that correct?

Actually the points for SPVCNN is the same as the voxels at the finest granularity in MinkowskiNet.

kentang-mit commented 3 years ago

A quick update, I could basically arrive at close results, but I've noticed some "overfitting". For the training the motorcyclist class achieves 90% mIoU, but for the validation it's 0%. Do you have any insights?

I think that might be because there are limited amount of training data for small objects and the network just learns to memorize the training set. However, we do empirically find that SPVCNN / SPVNAS can outperform MinkowskiNet on small object categories due to the supplementary fine-grained information from the point-based branch.

nuneslu commented 3 years ago

@kentangSJTU many thanks! I could reproduce the results now!

CCInc commented 3 years ago

Hi @nuneslu , I'm curious how you were able to increase your results to over 60% mIOU, as right now I'm similarly only getting ~30% mIOU after 15 epoch using the default configuration.

nuneslu commented 3 years ago

Hi @CCInc, so as far as I remember the biggest problem that I had was that I was trying to use SemanticKITTI class weights, I was using it wrongly, then I just dropped it and used only the CrossEntropyLoss as in the paper, then it worked just fine. But also there was some fix on my framework to deal with SparseTensors properly.

ksoy0128 commented 2 years ago

Hi, @CCInc , I also faced the same problem. I trained spvcnn with the default option, but the loss does not fall and the result is 20-30% mIoU. Also, when evaluating with a pretrained model, the average mIOU is almost 0. Do you know any solution?