Open amyeroberts opened 1 year ago
Glad you get something different to work on 🚀 👀 🎉
Hi, @amyeroberts, I don't know if you are working on this but if not I would be more than happy to take it up.
Oh, this is the issue page, not the PR page!
@shauray8 You're very welcome to take this up! :)
This model presents a new task for the library, so there might be some iterations and discussions on what the inputs and outputs should look like. The model translation should be fairly straightforward though, so I'd suggest starting with a PR that implements that and then on the PR we can figure out what works best.
Model description
ViTPose is used in 2D human pose estimation, a subset of the keypoint detection task #24044
It provides a simple baseline for vision transformer-based human pose estimation. It utilises a pretrained vision transformer backbone to extract features and a simple decoder head to process the extracted features. Despite no elaborate designs in the model, ViTPose obtained state-of-the-art (SOTA) performance of 80.9 AP on the MS COCO Keypoint test-dev set.
Open source status
Provide useful links for the implementation
Code and weights: https://github.com/ViTAE-Transformer/ViTPose Paper: https://arxiv.org/abs/2204.12484
@Annbless