Bad inference result on sample after overfitting on same sample

jonasdieker commented 1 year ago

Hi @zhulf0804 ,

I wanted to ensure the model can memorise a single training example. To do this I set the __len__() method in the Dataset to return 1. When training I printed the data_dict to ensure that the same sample was used for each iteration. Since the dataset length was set to 1, each epoch consisted of a single training step.

I visualised the train curves in tensorboard and as expected, all three losses eventually decreased to 0. Then I wanted to visualise the prediction of the model. For this I used the test.py script. However, when running on the same sample from training (000000.bin) the model produces zero predictions.

If I set the score_thr in pointpillar.py to 0, then I get a lot of predictions but they are obviously all very low confidence.

Any idea where I am going wrong?

zhulf0804 commented 1 year ago

Hi @jonasdieker, It's strange. Could you post the visualized predictions when setting score_thr to 0 ? By the way, did you load the pretrained weight successfully?

jonasdieker commented 1 year ago

Hi thank you for your very fast reply!

Sorry maybe I should have made it clear that I wanted to train from scratch on a single kitti sample to see if I can get decent predictions overfitting. Therefore, no pretrained weights were loaded, instead I loaded the model weights which I saved from my overfit-training run, produced as described above.

The reason: I tried to do the same for NuScenes to test if the model can memorise the new data when overfitting. In this case the model also predicts nothing, however I am not able to get a zero loss after playing with the parameters. So there is likely more parameter adjusting I need to do still ...

Here is the visualisation you asked for. (Note: I am using a different visualisation function because your one did not work for me over ssh)

White is pedestrian, green is cyclist and blue is car.

Here are the confidences:

[0.0112691  0.01061759 0.01054672 0.01012148 0.01011159 0.00997026
 0.00983873 0.00945836 0.00936741 0.00894571 0.00888245 0.00886574
 0.00883586 0.00870235 0.00864896 0.00861476 0.00859446 0.00854981
 0.00853697 0.00851393 0.00847296 0.00834575 0.00832187 0.00829636
 0.00829282 0.00826259 0.00825665 0.00825058 0.00824824 0.00824112
 0.00823086 0.00821262 0.00817523 0.00817244 0.00815322 0.00815221
 0.00809674 0.00809228 0.00809175 0.00807787 0.00805884 0.00801394
 0.00799607 0.00798928 0.00394109 0.00385207 0.00380854 0.00376242
 0.00368402 0.00364244]

And the class counts:

[44, 4, 2]

Hope this is somewhat helpful for you!

jonasdieker commented 1 year ago

One more comment worth making: In the kitti dataloader I actually commented out the data_augment function.

I did this in order to consistently get the same data for overfitting. I only use point_range_filter even for split="train".

zhulf0804 commented 1 year ago

Hello @jonasdieker, did you also visualize the G.T. result and the predicted result used the weights provided by this repo on 000000.bin. Are they reasonable ?

jonasdieker commented 1 year ago

Yes, I did and they were fine. That is why I am confused by my experiments outcome!

Edit: I will send a visualisation of that when I have access to the machine again!

zhulf0804 commented 1 year ago

Ok. One more thing, could you help to verify the single training example is 000000.bin again ?

jonasdieker commented 1 year ago

So I tried it again and verified I was overfitting on the same sample as I was testing on. I tried it with 000000.bin and then also 000001.bin individually, and both times the loss was practically zero but returned no bounding boxes at all with the test.py script and the default setting defined here:

https://github.com/zhulf0804/PointPillars/blob/b9948e73505c8d6bfa631ffdf76c7148e82c5942/model/pointpillars.py#L262-L266

Could you try to repeat this experiment? It should only take a few minutes.

Edit:

When setting the train_dataloader to split="val" and still with the training set length set to 1, I can perform training and validation on the same 000001.bin sample only. The weird thing is that if I look at tensorboard I get the following plots:

So now I am even more confused but it confirms that val/test performs really badly in this specific scenario. Especially the class loss actually diverges, which again makes sense why the confidence is so low and all boxes are filtered out by the get_predicted_bboxes_single method with the default params linked above.

jonasdieker commented 1 year ago

@zhulf0804 Ok, I think this is kind of interesting:

The only difference between train and val in train.py is the fact model.eval() is called (which of course you should be calling). But if I comment out that line I get the following plots:

Doing the same in test.py I get:

which is perfect! So, overfitting works exactly as expected with this change. However, I do not understand how this impacts the performance, as changing from train mode to eval mode does the following:

I think I need to give this some more thought. Let me know if you have an explanation!

zhulf0804 commented 1 year ago

Hello, @jonasdieker. Both validation cls loss and visualized prediction (using test.py) become well by just removing model.eval(), like the following line ? https://github.com/zhulf0804/PointPillars/blob/b9948e73505c8d6bfa631ffdf76c7148e82c5942/train.py#L139

jonasdieker commented 1 year ago

Hello @zhulf0804, yes that is exactly right!

zhulf0804 commented 1 year ago

Ok, and I'm also confused about the result. I'll test it when I have access to the machine. Besides, looking forward to your explanation to this question. Best.

mdane-x commented 11 months ago

Do you have any updates on this? @jonasdieker did you find out the issue? I am getting the same problem, overfitting on one (or few samples) loss goes to 0, but then 0 predictions using test.py. And even worse, when I run test.py multiple times with NO changes, i get different results (sometimes few bboxes, most of the time zero - [] [] [])

jonasdieker commented 11 months ago

Hi @mdane-x, as far as I remember overfitting on one (or a few) sample(s) didn't work. I ended up commenting out model.eval(). I believe the issue was due to the normalisation. If you have a good explanation of what is going on, please add it here!

mdane-x commented 11 months ago

Hi @jonasdieker, thanks for the answer. I haven't managed to make it work, even after removing the eval() line. I am getting empty predictions with any trained model (on few samples)

jonasdieker commented 11 months ago

@mdane-x, hmmm that is very strange. I am not sure how to help you. In my experience it helps to visualise as much as you can. What does your validation loss look like? Is it also going to zero?

zhulf0804 / PointPillars

Bad inference result on sample after overfitting on same sample #48