wei-tim / YOWO

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
846 stars 158 forks source link

Number of ground truth bounding boxes #66

Open lix4 opened 3 years ago

lix4 commented 3 years ago

Based on your code, do you expect at most 50 ground truth bouncing boxes for each video clip? And it is consistent with 7x7 grids. So, each grid is responsible for predict one box?

okankop commented 3 years ago

Yes, you are completely true!