wei-tim / YOWO

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
840 stars 161 forks source link

regarding the paper and costume data set generation #54

Open ghazalehtrb opened 4 years ago

ghazalehtrb commented 4 years ago

Thank you for the great paper and open-sourcing your work. I am trying to train the network on a costume dataset that contains multiple people performing different tasks in each frame. I am only interested in a few activities but there are people doing other stuff in the background as well. I was wondering if I need to annotate all actors or if I can just annotate the ones that I'm interested in? I felt like not annotating other actors has a negative effect on actor detection part. I thought about annotating them all in one class as "backgroud" but that seems to be confusing for the network as well since they are doing different activities. I noticed one of the GIFs with people fencing contains a guy siting on a chair in the background and he is not detected, is this the same situation as mine? I'm not sure if UCF dataset contains "siting" class. sorry for the long question.