wolterlw / hand_tracking

Minimal Python interface for Google's Mediapipe HandTracking pipeline
Apache License 2.0
148 stars 37 forks source link

Removing rotation estimation decreases accuracy #18

Closed AmitMY closed 4 years ago

AmitMY commented 4 years ago

I used the code in the current master branch to predict the bbox and joints for an image: image

Works as well as expected.

Then I used the hand_tracker2 code in the other branch, which supports multiple hands detection and joints prediction, and got this: image

Notice that the second hand from right (the one detected in both tests) - seems to get better joints prediction when the rotation is predicted.

I'm not sure if there is any other thing going on here, or if this is just a bad, sample size of 1 example, but should be taken into consideration.

Code:

palm_model_path = "./models/palm_detection.tflite"
landmark_model_path = "./models/hand_landmark.tflite"
anchors_path = "data/anchors.csv"

img = cv2.imread('data/hands.jpg')[:, :, ::-1]

# box_shift determines
detector = HandTracker(palm_model_path, landmark_model_path, anchors_path, box_shift=0.2, box_enlarge=1.3)
hands = detector(img)

f, ax = plt.subplots(1, 1, figsize=(10, 10))
ax.imshow(img)
for hand in hands:
    ax.scatter(hand["joints"][:, 0], hand["joints"][:, 1])
    ax.add_patch(Polygon(hand["bbox"], color="#00ff00", fill=False))

f.savefig("data/hands_out.png")
wolterlw commented 4 years ago

what do you mean "when the rotation is predicted". It's predicted in both cases. which is evident looking at the bounding boxes. I've changed the anchor selection mechanism a bit to deal with exp overflow, but that's about it.

AmitMY commented 4 years ago

Ah I expected the title:

HAND TRACKER WITHOUT ROTATION ESTIMATION AND JOINT DETECTION

To be without rotation, but you are right, there obviously is rotation.

What do you think is the reason for the rotation to be predicted differently then? image image

wolterlw commented 4 years ago

So if you look here, you'll notice that I pick the detection with the largest bounding box.
In the case of multiple hands I pick the detection with the largest score .
Guess the first option worked better for that one hand, but one would have to test the model on some data with compatible ground truth to know which approach works best generally.

wolterlw commented 4 years ago

check out the new multihand branch. It's a bit cleaner than optical_flow. Accuracy could be better in some cases but I believe it's due to simplistic aggregation.

wolterlw commented 4 years ago

I've opened a clean new issue for this problem.