Is there code on just using the models for inference? I want to feed the model images (each image may have multiple humans), and I would expect the outputs to be the body part estimations (in JSON or whatever format).
As I am reading the issues, especially the one on making inference with a web camera, I "think" that using these models is not trivial. For example, it seems to be I have to detect "people" (use an object detection model, for example), and then supply the center of the bounding box to the model to predict the pose components. Is this correct? This approach would seem like a multi-step process.
Is there code on just using the models for inference? I want to feed the model images (each image may have multiple humans), and I would expect the outputs to be the body part estimations (in JSON or whatever format).
As I am reading the issues, especially the one on making inference with a web camera, I "think" that using these models is not trivial. For example, it seems to be I have to detect "people" (use an object detection model, for example), and then supply the center of the bounding box to the model to predict the pose components. Is this correct? This approach would seem like a multi-step process.