Closed EvenGu closed 10 months ago
This is not really possible in general, since some body parts may not be covered by one of the skeletons. This is why we framed the problem not as conversion between formats, but as checking consistency between them, by passing their concatenation through the affine combining autoencoder, such that the bottleneck can learn what is redundant among different formats. Without image information, it's a highly underdetermined problem.
I really appreciate your work as it addresses the very problem that has bothered me for a while -- the inconsistency in the skeletons across different datasets. Now it is possible to freeze some of your model weights, perhaps the autoencoder? to directly take 3d estimates as input (e.g from KinectV2) and output a different set of 3d estimates (e.g. smpl_24)? Looking forward to your reply!