maxjaritz / mvpnet

Implementation of the paper "Multi-view PointNet for 3D Scene Understanding"
Other
83 stars 15 forks source link

Multi-view feature augmented point cloud #3

Open dcy0577 opened 3 years ago

dcy0577 commented 3 years ago

Hi @maxjaritz, first thanks for your wonderful work! I plan to use KPConv to replace PointNet++ to do some experiments. I want to know whether the multi-view feature augmented point cloud (used as input to pointnet) can be visualized or stored in .ply format?

In order to avoid changing a lot of code, I want to do some preprocessing on the basis of the multi-view feature augmented point cloud, so that it can be used as the input of KPConv. Maybe you can give me some advice on this? Thanks!

maxjaritz commented 3 years ago

Hi,

I plan to use KPConv to replace PointNet++ to do some experiments.

Good idea, it should be rather straight forward.

I want to know whether the multi-view feature augmented point cloud (used as input to pointnet) can be visualized or stored in .ply format?

I never did this, but it might be possible to save the feature augmented point cloud (xyz values + the 64 feature channels) using a ply file python library such as PyntCloud. It is not possible with Open3D, because I think it just supports colors and normals. Also it would take twice as much space, because Open3D only supports float64, even though the features are float32. I think it would be easier to use numpy to store the data, i.e. npy or npz files. For visualization, you could compute the norm of the 64 sized feature vector to reduce to size 1, then use a color map. I like Open3D for visualization.

In order to avoid changing a lot of code, I want to do some preprocessing on the basis of the multi-view feature augmented point cloud, so that it can be used as the input of KPConv. Maybe you can give me some advice on this? Thanks!

I think you should use the 2D CNN to predict all the features from the images, lift to 3D and save the feature augmented point clouds to disk. Then, you can use them to train a KPConv network. Therefore, you slightly need to modify the KPConv architecture to accept input features of size 64 instead of size 3.

Hope that helps, Max

dcy0577 commented 3 years ago

I think you should use the 2D CNN to predict all the features from the images, lift to 3D and save the feature augmented point clouds to disk. Then, you can use them to train a KPConv network. Therefore, you slightly need to modify the KPConv architecture to accept input features of size 64 instead of size 3.

I see. This is exactly what I want to do. So in this case, I think I should save one feature augmented point clouds for each full scene, but not do the 2D-3D lifting in the chunk, right?

maxjaritz commented 3 years ago

Exactly. Whole scene processing is simpler. For the 2D-3D lifting, you should use a sufficient number of 2D views for the whole scene. You can try ~10 to 30.

dcy0577 commented 3 years ago

Hi Max, thanks again for your advice. I got some questions regarding labels when go deep into your code.

  1. In scannet2d3d dataset, you use seg_label = self.nyu40_to_scannet[seg_label].astype(np.int64) to map the label from pkl file(cache) to scannet label. If I understand correctly, are the labels in the cache file already nyu40, but not the raw ids?

  2. During debugging, I noticed that there were some ids like 19 or 17 appeared in seg_label, which however are not belong to the ids in scannet labels. I find out this was because of the code below: https://github.com/maxjaritz/mvpnet/blob/cadf636749b5ee6e73e96ff68e4b32728088decd/mvpnet/data/scannet_2d3d.py#L123 Could you please explain this? I would rather do: self.nyu40_to_scannet[list(self.scannet_mapping.keys())] = np.array(list(self.scannet_mapping.keys()))

  3. I also saw you set ignore_value = -100 Is there a reason behind this? Can I set it to 0?

  4. I used the structure below to get the multi-view features and save them to disk : https://github.com/maxjaritz/mvpnet/blob/cadf636749b5ee6e73e96ff68e4b32728088decd/mvpnet/models/mvpnet_3d.py#L90-L113 I load the pretrained 2d network and also froze its parameters. However, I noticed that for the same input scene, the feature2d3d obtained by running the code are different. So every time I run the code, I will get a new feature2d3d result, even when the input is the same. That makes me quite confuse.

Your help is greatly appreciated! Changyu

maxjaritz commented 3 years ago

Hi Chanyu,

  1. It seems like it. Unfortunately I don't remember this detail. Maybe you can check it yourself.
  2. same as 1.
  3. You should leave it as -100, because it is the default ignore label in the cross entropy loss.
  4. This seems weird indeed. Have you made sure that the input data is always the same? Maybe you can write a unit test with some dummy data which is always the same.

Hope it helps Max

dcy0577 commented 3 years ago

This seems weird indeed. Have you made sure that the input data is always the same? Maybe you can write a unit test with some dummy data which is always the same.

Yes indeed. I'm very sure I used the same input data. I guess it might have something to do with parameter initialization. Every time I call the FeatureAggregation function, the parameters of those layers inside it are reinitialized. Please correct me if I am wrong...