lucidrains / magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch
MIT License
565 stars 34 forks source link

About GroupNorm described in the MAGVIT V2 paper #41

Closed sen-ye closed 3 months ago

sen-ye commented 5 months ago

Hello, thanks for your nice work. I notice that there are some differences between your implementations and original paper. One notable difference is the use of group normalization in the original paper. From my understanding, directly applying group normalization to a 5D video tensor (B, C, T, H, W) can result in non-causal behaviors. In your implementation, you did not include group normalization. Could you please explain your reasoning behind this choice? Is it related to the issue I mentioned?

Jason3900 commented 3 months ago

Typically you just have to pack B, T to one dimension and do the normal group norm, then you rearrange it back to the original dim.

sen-ye commented 3 months ago

I see, thank you