Thanks for your great work! I discovered that during both the training and the reasoning process of the demo, it is necessary to provide the bbox marking the position of the human hand. I'm curious about how this bbox is obtained in practical applications, considering that it is very difficult for conventional methods to predict the complete area of the human hand when the hand is occluded by the object. Thanks
Thanks for your great work! I discovered that during both the training and the reasoning process of the demo, it is necessary to provide the bbox marking the position of the human hand. I'm curious about how this bbox is obtained in practical applications, considering that it is very difficult for conventional methods to predict the complete area of the human hand when the hand is occluded by the object. Thanks