Train on custom point cloud data

loicland / superpoint_graph

Large-scale Point Cloud Semantic Segmentation with Superpoint Graphs

MIT License

764 stars 213 forks source link

Train on custom point cloud data #97

Closed therealanshuman closed 5 years ago

therealanshuman commented 5 years ago

Hi Loic, I am trying to implement your awesome work on a custom point cloud dataset that has about 35 million labelled 3D points (xyz values and a label) for training in a single csv file. I have written a method read_custom_format in the file provider.py as it was clearly instructed in the readme which returns numpy arrays of xyz values (Nx3) and labels (Nx1) after fetching it from the csv files.

Based on this data I have an implementation question for partition.py - Do I need to subsample the training data into fixed size point clusters and store it into several files before calling read_custom_format method in partition.py? I don't really understand how the method libply_c.prune works although I think it does some sub-sampling based on a voxel size argument!

loicland commented 5 years ago

Hi,

You dont need to to the pruning yoursef, the function in ply_c will do it for you.

35 million points is probably too much for cut pursuit and 3specially the delaunay triangulation, depending on your hardware. Just select a voxel_width which is appropriate, it usually helps by decreasing noise as well.

I would aim for 5 millions, but depending on your hardware you can probably prune less.

therealanshuman commented 5 years ago

Oh Okay! So does that mean I can crop the training set of 35 million points into multiple regions of roughly 5 million dense points each and store them in separate csv files and then execute the modified partition.py to get superpoint graphs on each of those? How many such random crops of the training data would you suggest I go for so that my learning is really good on that particular dataset?

loicland commented 5 years ago

What is your train/test split like? You could probably split your data into 5 sets and do 5-fold cross valdiation.

The performance will depend on how homogeneous your dataset is. if some structures only exist in one of the split the algorithm will naturally struggle.

therealanshuman commented 5 years ago

Great, thank you very much! I was planning to split the 35 million points into 7 uniform sets of about 5 million each, train on 5 of them and validate on the rest. But, a 5-fold cross validation with homogeneously and uniformly cropped regions seems like a better idea.

therealanshuman commented 5 years ago

What sort of changes do I need to make in the preprocess_pointclouds method in custom_dataset.py? I can see that the rgb values are being rescaled to [-0.5, 0.5] in the template code. But, I don't have any rgb values in my custom dataset. Therefore, I am returning an array of zeros for the rgb values from read_custom_format method in provider.py as it was clearly instructed in the readme. So, should I leave that line as it is or omit the rescaling operation?

loicland commented 5 years ago

Hi,

you do not need to return an array of 0 from read_my_custom_format. However, you need to feed the function prune such an array, like in the example I gave line 143 of /partition/partition.py.

Then do not append the rgb values in the parsed files, at the equivalent of line 94 in /learning/custom_dataset.

You then need to change the indices accordingly line 215 of learning/spg.py.

Alternatively just put 0 in place of RGB in the parsed files and not change the indices, but it would be wasting memory.

Does that clear things up?

therealanshuman commented 5 years ago

Hi Loic, Thanks for the detailed instructions. I have made the changes and following are some screenshots of the relevant pieces of code -

Line 143 of partition.py
Line 94 of custom_dataset.py
Line 215 of spg.py

But, on executing main.py I am getting the following error spg4 This, I suppose, means that the feature length is still 11 (xyzrgbelpsv) although I am only working on 8 (xyzelpsv). What might be going wrong here? Are there any other places that might need modifications?

therealanshuman commented 5 years ago

The training's working fine now, had to run the partitioning code again. Thank You!