hukenovs / hagrid

HAnd Gesture Recognition Image Dataset
https://arxiv.org/abs/2206.08219
608 stars 94 forks source link

Implementing new classes #9

Closed kerem0comert closed 1 year ago

kerem0comert commented 2 years ago

Hello,

Thanks for the great and extensive data set. You talk about combining 18+1 pre-existing classes of gestures to create dynamic gestures like swipe. What I am wondering is, is there an idea or a solution that could be feasible to extend the 18 existing classes, by using the trained models?

Assume that I introduce a 19th gesture and add training samples for it. Would it be possible to create a new classifier which takes the last layers of your model as input? I believe in such a case I would have to disregard the last layer as it maps to either of the 19 classes thus creating a bottleneck. But I still believe the last N-1 to N-x layers still hold some intrinsic information about the positioning of the hand to take as input for my new classifier.

Looking forward to any ideas for this task.

anotherhelloworld commented 2 years ago

Hello, @kerem0comert

You can use our pre-trained backbone for your task, if you have samples for a new gesture that you want to recognize. Example code for creating a new MobileNetV3_large model using our weights:

model = MobileNetV3(num_classes=19) # our model with 18 gestures + no gesture
model_new_gesture = MobileNetV3(num_classes=20) # your model with 18 gestures + no gesture + 1 new gesture
model.load_state_dict(torch.load("/path/to/MobileNetV3_large.pth")["state_dict"]) # load weights with pre-trained backbone
model_new_gesture.backbone = model.backbone # use our backbone with your new classifier

Then you can start training process. You can also use this code for other classification models from our repository.

https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Also you can train your model using our data to be able to get latent space from different gesture image, then use the cosine similarity to determine the class of gesture. We have not tested this approach. This method would be less accurate than the classic classification model. But this method will significantly expand the set of recognizable gestures with a small set of training data.

nagadit commented 1 year ago

I am closing this issue, but feel free to reopen it if you feel like it has not been resolved.