yrcong / STTran

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
MIT License
187 stars 34 forks source link

Data related problem #49

Closed Fillip1233 closed 1 year ago

Fillip1233 commented 1 year ago

Hello author, thank you very much for your excellent open-source work. After reproducing your results, I have the following questions. If you could provide me with answers, I would greatly appreciate it. 1.I found that this code cannot change the batch size. I analyze the issue with the following code. May I ask why this processing is necessary? If the following code is not used, can the model work properly under different batch sizes after adjustment

image

2.I noticed that you had a discussion with the author of ActionGenome and mentioned that the mAP results of the AG dataset processed using fastRCNN were not good. Have you ever used the AG dataset for multi object detection? I used the most advanced multi object detection model to perform target detection on the AG dataset and obtained very poor results. Do you have any relevant experiments to explain this phenomenon?

yrcong commented 1 year ago

Hi,

  1. We directly set the batch size to 1, which means only one video is used in each iteration. The reason is that different videos have different frame numbers. Therefore it is challenging to implement batch operations. One possible approach for a larger batch size is zero padding. You could pad the short videos in the batch with zero/black frames and re-assign the target labels. Of course it's not easy. Another way is using multiple GPUs and each GPU loads one video (it is tricky and batch size should be still 1 per GPU).
  2. I will say the annotation quality is not super good. I have struggled with this as well. First, the resolution is not high and small objects like food/phone are difficult to identify. Therefore we removed extra small boxes. Moreover, there are some duplicate categories e.g. notebook and paper. Sometimes one entity is labeled twice with different labels and boxes.
Fillip1233 commented 1 year ago

Thank you for your timely answer. Your idea makes sense.