Closed tracer9 closed 4 years ago
Hi @tracer9, thanks a lot for your interest!
Yes you are correct, and I would say this is more of an empirically (rather than a theoretically) driven decision, as experiments seem to suggest that adding residuals to the graph layers would degrade the performance a little bit.
I don't have a perfect explanation for this, but it could be related to the importance of spatial aggregation over temporal aggregation. In our ablations, we found that 3 blocks of [1 MS-GCN layer + 3 MS-TCN layers] (Table 2, row 3, see supplementary) can outperform the usual set up, say, 10 blocks of [1 GCN + 1 TCN] (e.g. used by 2s-AGCN). To some extent, this suggests that GCN layers need not be deep and that having multi-scale aggregation could be important (also suggested by, e.g., [1][2][3]). Since the final model only has 3 MS-GCN / MS-G3D layers without residuals (in parallel), gradient flows might not be a huge problem. Also, since most of the model weights are in the GCN/G3D layers (>2.5M in the 3.2M model, see [4]), removing identity skips could force the model to learn more useful aggregation layers.
[1] https://arxiv.org/pdf/1902.07153.pdf
[2] https://arxiv.org/pdf/1904.12659.pdf
[3] https://arxiv.org/pdf/1905.00067.pdf
[4] Quick code snippet to check # params for GCN/G3D layers (add at the end of msg3d.py
):
for n, p in model.named_modules():
if ('sgcn' in n or 'gcn3d' in n) and n.count('.') < 1:
print(n, sum(pa.numel() for pa in p.parameters() if pa.requires_grad))
Hope this helps!
Hi Ziyu! Recently I read your paper about skeleton-based action recognition. It is really a solid work! However, when I try to deeply dive into the model, I find it werid since there are no residual in both MS-G3D and MS-GCN.
I notice that there IS A reidual path in MS-TCN implemented by conv1x1. However, after careful check, there are no residual path in other modules which means: 1. the low-level skeleton data have to pass three heavy STGC to get final result; 2. the gradient may not be flow back via residual link.
Also in vanilla ST-GCN, a residual link exists in every GCN-TCN block.
However, the experiment result IS not only stable but also satistying. Could you share what you think about this model design? Thanks a lot :)