kenziyuliu / MS-G3D

[CVPR 2020 Oral] PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition"
https://arxiv.org/abs/2003.14111
MIT License
424 stars 96 forks source link

About over-smoothing #53

Closed 15762260991 closed 1 year ago

15762260991 commented 2 years ago

Thanks for the great job!@kenziyuliu I would like to ask the deep graph convolution creates the problem of over-smoothing, and if your proposed MS-GCN can effectively overcome this problem? Sincerely look forward to your reply!

15762260991 commented 2 years ago

I would be grateful if you could help in your busy schedule! @kenziyuliu

kenziyuliu commented 2 years ago

Hi @15762260991,

Thanks for your interest! I think over-smoothing can occur whenever many GCN layers are sequentially stacked together, essentially making features of neighboring nodes too similar from excessive neighborhood aggregation [1]. This could be a potential issue for MS-G3D if you stack too many layers together.

However, I think (1) MS-G3D doesn't actually have all that many graph convolutions in the default architecture (3 of them in any forward pathway), and (2) there are interleaving temporal modeling layers (MS-TCN) so that new features from other time steps are incorporated into each node in between the graph convolutions (i.e. we're not aggregating the "same" neighbors at each GCN layer per se). One could also think of having parallel branches of multi-scale graph convolutions in MS-G3D as a way to mitigate the over-smoothing problem (in a similar spirit to [2]).

Hope this helps!

[1] https://arxiv.org/pdf/1801.07606.pdf [2] https://arxiv.org/abs/1905.00067

15762260991 commented 2 years ago

Thank you very much for your answer, it has benefited me a lot! I think the MS-GCN module in your MS-G3D can avoid over-smoothing, do you think I understand it correctly? @kenziyuliu

15762260991 commented 1 year ago

I also have a question to ask, can ms-gcn aggregate global information? Please don't hesitate to enlighten me! Thank you very much! @kenziyuliu

kenziyuliu commented 1 year ago

Hi @15762260991,

Thanks again for your interest -- I think the term "global information" might be a bit ambiguous. Could you clarify what you meant?

15762260991 commented 1 year ago

@kenziyuliu Thank you very much for your patient guidance! What I understand is that each gcn aggregates the neighbor node information of the current hop, and when K is equal to the diameter of the graph, the global information will be available as the information of different hop aggregates are stitched together, may I understand this correctly? I would be grateful for your advice!

kenziyuliu commented 1 year ago

Yes I think that is correct, you can set num_scales (https://github.com/kenziyuliu/MS-G3D/blob/master/model/ms_gcn.py#L16) to cover the entire graph in the spatial dimension. Though in Table 1 of the paper suggests that it might not always be the best idea to always set K to be large (it also increases computation). Hope this helps!

15762260991 commented 1 year ago

@kenziyuliu This solved my query, thank you very much!