keyu-tian / SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
https://arxiv.org/abs/2301.03580
MIT License
1.41k stars 82 forks source link

Using 3D convolution #55

Closed chinmay5 closed 11 months ago

chinmay5 commented 11 months ago

Thank you so much for sharing the code base. I was wondering how to apply 3d convolution using the setup. I think we need to update the section

def sp_conv_forward(self, x: torch.Tensor):
    x = super(type(self), self).forward(x)
    x *= _get_active_ex_or_ii(H=x.shape[2], W=x.shape[3], returning_active_ex=True)    # (BCHW) *= (B1HW), mask the output of conv
    return x

However, I am not certain. It would be great to know your opinion. Any suggestions about things to keep in mind during the implementation of 3D conv would be highly appreciated.

keyu-tian commented 11 months ago

@chinmay5 if you're using 3d conv for video, you may mimic writing our _get_active_ex_or_ii function as 3D version, e.g., call repeat_interleave for 3 times. _cur_active would be a BCHWT-like binary tensor, which means you should modify https://github.com/keyu-tian/SparK/blob/main/pretrain/spark.py#L81 to create a BCHWT-like binary mask.

If it is for 3d point cloud or 3d sparse voxel processing, i would recommend you to use the sparse conv of https://github.com/NVIDIA/MinkowskiEngine rather than ours for efficiency.

chinmay5 commented 11 months ago

Thank you so much for your input