yohanshin / WHAM

MIT License
719 stars 78 forks source link

RICH bounding box determination #107

Open isarandi opened 4 months ago

isarandi commented 4 months ago

Thanks again for this useful repo. I'm trying to understand how evaluation is done on RICH. It seems the current code is incomplete here https://github.com/yohanshin/WHAM/blob/2b54f7797391c94876848b905ed875b154c4a295/lib/data_utils/rich_eval_utils.py#L61

I'm specifically wondering how the target person is identified, since sometimes there are other people visible in the frame (usually further in the background), so if the model predicts all of them, one must be picked for evaluation. Is the bounding box chosen from the detection result based on similarity to the GT annotated person's bbox? Or is the GT bbox used for prediction?