Closed therealanshuman closed 5 years ago
Hi,
You dont need to to the pruning yoursef, the function in ply_c will do it for you.
35 million points is probably too much for cut pursuit and 3specially the delaunay triangulation, depending on your hardware. Just select a voxel_width which is appropriate, it usually helps by decreasing noise as well.
I would aim for 5 millions, but depending on your hardware you can probably prune less.
Oh Okay! So does that mean I can crop the training set of 35 million points into multiple regions of roughly 5 million dense points each and store them in separate csv files and then execute the modified partition.py
to get superpoint graphs on each of those? How many such random crops of the training data would you suggest I go for so that my learning is really good on that particular dataset?
What is your train/test split like? You could probably split your data into 5 sets and do 5-fold cross valdiation.
The performance will depend on how homogeneous your dataset is. if some structures only exist in one of the split the algorithm will naturally struggle.
Great, thank you very much! I was planning to split the 35 million points into 7 uniform sets of about 5 million each, train on 5 of them and validate on the rest. But, a 5-fold cross validation with homogeneously and uniformly cropped regions seems like a better idea.
What sort of changes do I need to make in the preprocess_pointclouds
method in custom_dataset.py
? I can see that the rgb values are being rescaled to [-0.5, 0.5] in the template code. But, I don't have any rgb values in my custom dataset. Therefore, I am returning an array of zeros for the rgb values from read_custom_format
method in provider.py
as it was clearly instructed in the readme. So, should I leave that line as it is or omit the rescaling operation?
Hi,
you do not need to return an array of 0
from read_my_custom_format
.
However, you need to feed the function prune
such an array, like in the example I gave line 143 of /partition/partition.py
.
Then do not append the rgb values in the parsed files, at the equivalent of line 94 in /learning/custom_dataset
.
You then need to change the indices accordingly line 215 of learning/spg.py
.
Alternatively just put 0 in place of RGB in the parsed files and not change the indices, but it would be wasting memory.
Does that clear things up?
Hi Loic, Thanks for the detailed instructions. I have made the changes and following are some screenshots of the relevant pieces of code -
partition.py
custom_dataset.py
spg.py
But, on executing main.py
I am getting the following error
This, I suppose, means that the feature length is still 11 (xyzrgbelpsv) although I am only working on 8 (xyzelpsv). What might be going wrong here? Are there any other places that might need modifications?
The training's working fine now, had to run the partitioning code again. Thank You!
Hi Loic, I am trying to implement your awesome work on a custom point cloud dataset that has about 35 million labelled 3D points (xyz values and a label) for training in a single csv file. I have written a method
read_custom_format
in the fileprovider.py
as it was clearly instructed in the readme which returns numpy arrays of xyz values (Nx3) and labels (Nx1) after fetching it from the csv files.Based on this data I have an implementation question for
partition.py
- Do I need to subsample the training data into fixed size point clusters and store it into several files before callingread_custom_format
method inpartition.py
? I don't really understand how the methodlibply_c.prune
works although I think it does some sub-sampling based on a voxel size argument!