Question about training with hand dataset?

I'm wondering if you guys get result in paper by just training metro on freihand dataset alone? I try to reimplement it, but it turns out fitting well on freihand but lack of generalization ability. I get stably decreasing loss(which means fitting well on training dataset I think) but the MPJPE,MPVPE,PA_MEJPE just won't get better(remains alomost the same since 1st epoch). So I'm wondering am I stuck with overfitting issue? Is there extra training data is used(but not mentioned)? I'm also more than glad to hear from those who successfully re-implement this paper.

microsoft / MeshTransformer

Question about training with hand dataset? #77