kenziyuliu / MS-G3D

[CVPR 2020 Oral] PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition"
https://arxiv.org/abs/2003.14111
MIT License
424 stars 96 forks source link

what is detailed operation of collapse window reshape and fc #24

Closed nianniana closed 3 years ago

nianniana commented 3 years ago

i am intereted in your great work. However, in the network architecture figure in the paper, i am confused about the detailed operation of "collapse window reshape and fc", would you please explain the procedure in detail. Thanks a lot!

kenziyuliu commented 3 years ago

Hi there,

Thanks for your interest! Have a look at the following code pointers:

  1. For each MS-G3D block, we first expand the joint features by "unfolding" it (i.e. sliding temporal window, (N,C,T,V) --> (N,C,T,window_size*V)): https://github.com/kenziyuliu/MS-G3D/blob/master/model/msg3d.py#L43
  2. We perform multi-scale graph convolutions on the spatial-temporal graphs (effectively larger graphs):
  3. The expanded features from unfolding the temporal windows are projected back to each frame with the FC layer (implemented with 1x1 conv, (N,C,T,window_size*V) --> (N,C,T,V):

(N=batch size, T=number of frames / number of windows, C=number of feature channels, V=number of nodes) This would be easier to understand if you try tracing the dimensions of the tensors. Hope this helps.