OctreeAttention Behavior with Patch Partitioning During Training

octree-nn / octformer

OctFormer: Octree-based Transformers for 3D Point Clouds

MIT License

259 stars 18 forks source link

OctreeAttention Behavior with Patch Partitioning During Training #5

Closed qixuema closed 1 year ago

qixuema commented 1 year ago

Hi Professor Wang,

When the model is training, assuming batch_size = 32, would it be possible for OctreeAttention to partition a patch at a specified depth using nodes from two adjacent batches? If this situation occurs, how should the self-attention be interpreted in this case?

I would greatly appreciate your insights on this matter. I look forward to hearing from you at your earliest convenience.

wang-ps commented 1 year ago

In current implementation, it is not possible for OctreeAttention to partition a patch at a specified depth using nodes from two adjacent batches.

qixuema commented 1 year ago

I apologize for my unclear English expression. Please forgive me. I would like to inquire about the behavior of the nodes at the same depth for two adjacent training samples within the same batch during the training process. Would they exhibit the connected pattern depicted in the diagram below?

convAsset 1@4x

Or is it the case that during training, each training sample undergoes its own independent padding (blue)?

convAsset 2@4x

wang-ps commented 1 year ago

Thanks for your clarification. The Attention runs according to the following figure.

But I build a mask to erase the overlapped attention values to ensure the consecutive samples in the batch processed seperately. https://github.com/octree-nn/octformer/blob/d727ba08fb801c7340784440ef5e49ec4c53218a/models/octformer.py#L55

qixuema commented 1 year ago

I understand now, creating a mask is indeed a logical and practical solution. I am sincerely grateful for your clear explanation. Thank you immensely!