luis-gonzales / pointnet_own

Personal implementation of PointNet in TF 2.0
80 stars 27 forks source link

Unbalanced training data #1

Closed mpdroid closed 4 years ago

mpdroid commented 4 years ago

Kudos for the architecture and source code!

Trying to apply it to the Lyft 3d object detection competition on Kaggle (even though it is over) and having the following challenges;

  1. The training data annotations are over 90% cars. So the model is converging quickly to a solution by predicting everything as a car. Is it better to a) balance (stratified sampling?) the training data or b) weight the losses. If b) what could be a good weighting mechanism.

I have been trying several variations of a) and b) with no success.

  1. The Lyft dataset point clouds are relatively sparse - most ranging from 150 to 300 per annotation, but with some extreme values (30,000+) . From your documentation, min recommended seems to be 512. Will the architecture work for sparse point clouds?

  2. Memory. Min-batch size recommendation seems to be 32. But when I try anything more than 10, it blows up the kaggle kernel on memory. Will reducing the mini-batch size have a significant impact on training?

Any suggestions would be helpful

Thank you

luis-gonzales commented 4 years ago

@mpdroid, thanks for the note. I'm glad you're able to use the repository, but please cite/give credit as appropriate.

  1. This question is not specific to the repository or algorithm. As it seems you're already aware, class imbalance can be handled in many different ways, including hard balancing, weighting, or loss functions. Your best bet is experimentation and searching online.

  2. Where do you see that the min number of points is 512? The whole point of point clouds is that it's a set of points, and, thus, the number of points is unstructured/unconstrained. Line 151 in src/model.py shows this (N by 3 input).

  3. Batch size is often found experimentally, so it's hard to tell whether there'd be a "significant impact" (ambiguous) if you decrease this parameter. I'd suggest getting on a standalone machine and not Kaggle. Admittedly, I haven't done Kaggle competitions, so I'm not sure what options there are here.

mpdroid commented 4 years ago

ok thank you. Yes I will make sure to cite your work, if I am able to make a submission.