snavas / GECCO

Gesture-Enabled Remote Communication and Collaboration
MIT License
5 stars 0 forks source link

Use a Neural Network for hand recognition & tracking #2

Open snavas opened 4 years ago

snavas commented 4 years ago

Maybe to try a lightweight neural network (like YOLO?) with the colorimage, depthimage & fingertip features. Use current approach to generate training data.

PaulaScharf commented 3 years ago

One easy out-of-the-box solution for feature detection of hands is this. However it doesnt detect hands in gloves.

An implementation of this can be found in the branch issue/neuralnet

Important Sidenote: Mediapipe can detect hands with a wide variety of skin tones. (problematic are tattooes though) annotated_image8

PaulaScharf commented 3 years ago

I managed to train a the deeplabv3 model from pytorch to do semantic hand segmentation using a tutorial . But the result is much slower than hoped for :/

Here is one gif of the detection at full image resolution: test

Here the resolution was reduced by 30%:

test2

Edit: This was done on the GPU not CPU. So thats not the cause of the performance issue.

Edit: I also tried cropping the image to the size of one hand, which increases performance a little bit (0.8 sec per frame instead of 1.2) but it does not scale well if there are several hands detected (n*0.8 sec per frame)

PaulaScharf commented 3 years ago

I am currently trying this tutorial wich has a well documented repository on github. It provides several different models as backbones, including very lightweight models like MobileNet. So I am hoping for a good inference time.

PaulaScharf commented 3 years ago

Here is the result using FCN32 and MobilNet from the previously mentioned repository.

test3

It is much faster but also very inacurate. The inacuracy might be due to the fact that I only used one third of the training data this time. I will try to train with more data and maybe switch out the models.

Update: Here is FCN8 and Mobilenet with more training data (25 epochs):

result

And here Segnet and Mobilenet (5 epochs):

test5

Update2: I think no major improvements in accuracy while maintaining the speed can be expected now. Atleast not with the available time and knowledge I have about the topic. So in conclusion I think semantic segmentation is not a viable option for this project.

PaulaScharf commented 3 years ago

Currently depth and optical flow are not used for the segmentation. However the depth values from the camera are very inacurate, so there is not much that can be done with them. Usage of optical flow to segment hands and maybe even recognise gestures should be investigated.

PaulaScharf commented 3 years ago

Imo hand feature detection (eg with mediapipe) is much more feasible in this project. It has the downside of having animate the detection.

PaulaScharf commented 3 years ago

Opencv provides a builtin class for background removal: link

We should try it out.

Edit: Here is a quick implementation: background So this obviously doesnt work on its own. But I still think background removal should be investigated. If I try it out in powerpoint it looks like this: Screenshot 2021-04-20 225727

Maybe this can be done with GrabCut.

PaulaScharf commented 3 years ago

Here is an attempt at grabcut: grabcut

This is suboptimal in several ways, namely the speed, the occasional confusion between the foreground and background and the general inaccuracy.

PaulaScharf commented 3 years ago

I tried to use the previously mentioned mediapipe in combination with the watershed algorithm to get a simple visualization of the hands. The results are not good, but maybe a start. mediapipe

PaulaScharf commented 3 years ago

Attempt number 2 at mediapipe. I am using it for colorcalibration now. I think it has potential. mediapipe_color Currently I set the detection confidence very high (only few but accurate detections), use all the detected hand feature points from mediapipe to get the hand color (I average the color of every hand feature point) and then segment the entire image for this hand color. An alternative approach would be to lower the detection confidence (-> many but at times inaccurate detections) and only segment the detected hand areas with the hand color for that area.

Edit: This is how the alternative approach looks. mediapipe_color2 I actually think it looks really nice so far :) Currently the hand color is between mean - (2*std) and mean - (2*std). But there is probably a better way to remove outliers from the detections than using standard deviation. I will have to look into that.

Edit: Here is after a bit of calibration. mediapipe_color4

PaulaScharf commented 3 years ago

Good news. Mediapipe also works well enough for darker skin colors :) dark7