Test on unclassified data sets

tatarchm / tangent_conv

Tangent Convolutions for Dense Prediction in 3D

121 stars 26 forks source link

Test on unclassified data sets #15

Open sgiraudot opened 5 years ago

sgiraudot commented 5 years ago

Hello,

Once a model has been trained, is there a way to apply to unclassified data? As far as I can tell, the configuration file does not differentiate the validation set (which needs to have a valid labeling) from the test set (which, in real life applications, could be an unclassified set). I have tried to include test sets with all labels equal to 0 (unclassified), but in that case precomputing the validation batches does not work.

Did I miss something or is it simply not possible, with the current framework, to classify data sets with unknown labeling?

tatarchm commented 5 years ago

Hi,

In the current version of the framework there is no 'proper' way to test on unlabeled data but I think the solution you describe should work. Could you please specify exactly what error you get when you try setting all labels to 0? I can also suggest trying to set them to 1 instead, because by default points with 0 labels correspond to the background class and may be ignored.

sgiraudot commented 5 years ago

I don't really get any error when I set all labels to 0, the software just gets stuck for a very long time in the function precompute_validation_batches(). If I understand correctly, there is at some point a search for a random point, and the random point is discarded if the label is 0: so with all labels to 0, I imagine it either goes to an infinite loop or to a very very long search that will never find anything.

If I put all labels to 1, am I correct that it also means that I should not use these labels in the validation set? Otherwise the training will wrongly consider these points as ground truth for label 1? Or is it working differently?

tatarchm commented 5 years ago

I see. I will update the code to support proper testing.

Sure, you should not use those labels in the validation set. Using unlabeled data for validation would not make sense anyway - you need ground truth there.