hehefan / Point-Spatio-Temporal-Convolution

Implementation of the "PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences" paper.
MIT License
102 stars 12 forks source link

PSTNET on SemanticKitti/nuScenes dataset #2

Closed sandeepnmenon closed 3 years ago

sandeepnmenon commented 3 years ago

Great work on the Point Tubes. I am particularly interested in the 4D semantic segmentation applications. I was wondering if you tried the PSTNet on benchmark datasets like Semantic Kitti or nuScenes dataset. These pointcloud sequences are much more sparse than the SYNTHIA dataset.

Thank you

hehefan commented 3 years ago

Hi @sandeepnmenon,

Apologies for my late response. Thanks for your suggestions. However, we currently do not have a plan to apply our method to Semantic Kitti or nuScenes datasets.

Our method focuses on spatio-temporal modeling and especially temporal modeling. There have already been a lot of excellent static point cloud approaches. We would like to pay attention to different temporal structures. The sparse or dense problem is mainly about spatial modeling.

PSTNet is actually a prototype that models point cloud sequences/videos in a decomposition manner. The spatial modeling method can be directly replaced with other static point cloud approaches which are good at sparse point cloud modelling. We may try these two datasets in the future with different PSTNet variants or extensions.

Thank you.

sandeepnmenon commented 3 years ago

Thank you @hehefan for the insights. I would like to try sequence classification with those two datasets using PSTConv. In the given model for sequence classification https://github.com/hehefan/Point-Spatio-Temporal-Convolution/blob/3214fe383e40cb933531dd246432c91a32db948c/models/sequence_classification.py#L15 I see all layers are PSTConv. As per your suggestion

The spatial modeling method can be directly replaced with other static point cloud approaches which are good at sparse point cloud modelling.

I would like to use my static point cloud model along with the temporal modeling mentioned in this paper. What part of the MSRAction model is the spatial modelling method. Can you give a small example of how the PSTConv can be augmented to static point cloud classification pipelines to make them model the temporal information?

Thank you

hehefan commented 3 years ago

Hi @sandeepnmenon,

You might want to modify the following section https://github.com/hehefan/Point-Spatio-Temporal-Convolution/blob/main/modules/pst_convolutions.py#L168-L193 This section aims to search neighbours and then encode the spatial local structure. You can replace the encoding logics with your code.

Best regards.

sandeepnmenon commented 3 years ago

Hi @hehefan There is the experiment on 4D semantic segmentation mentioned in the paper. Will that model be released? Since my semantic segmentation spatial model also follows a Unet architecture, I am not sure how I will incorporate just the spatial features in the above mentioned code in the PSTConv.

hehefan commented 3 years ago

Hi @sandeepnmenon,

PSTConv is a basic module to capture the spatio-temporal local structure for point cloud sequences or videos. It is independent of the specific PSTNet architectures for 3D action recognition or 4D semantic segmentation. For the segmentation architecture, please refer to point_segmentation.py. This architecture may provide insights into how to build UNet-style frameworks.

BTW, for segmentation, the transformer-based network ("Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos") seems to work better than the point spatio-temporal convolution. This is probably because convolution is rigid for edges or borders of objects, while transformer is flexible.

Best regards.

sandeepnmenon commented 3 years ago

Thank you @hehefan . The code and that paper is really helpful. Closing this issue.

PS: Is it possible to release the code for 4D semantic segmentation using the P4Transformer? Started a thread in that repo (https://github.com/hehefan/P4Transformer/issues/4)

Thank you again.