Open SAGNIKMJR opened 1 year ago
Hi, I think both are good enough if there is not any difference in performance, but it is hard to evaluate because the labels are not so accurate. Feel free to raise a pull request if you would like to help fix it. Thanks a lot.
Also, some clip frame numbers that are there in the bbox annotations (stored in data/ego4d/bbox
) are missing in data/video_imgs
. For e.g., there is frame number 8991
at index 291
of the list stored in data/ego4d/bbox/34bc808a-4b98-42c5-a1ea-6f565dd4aa20:track_0:29.json
but there is no corresponding image frame for it, i.e., data/video_imgs/34bc808a-4b98-42c5-a1ea-6f565dd4aa20/img_08991.jpg
doesn't exist. I looked at https://github.com/zcxu-eric/Ego4d_TalkNet_ASD/blob/main/scripts/extract_frame.sh and it looks like it dumps all clip frames in data/video_imgs
. Any idea why this is the case? Thanks.
Hi, we didn't encounter this bug on our side. One quick solution is to check the data sanity before training and discard these labels if necessary.
Hi,
There is a chance of misalignment between the AV frames and the labels in
dataloader.py
due to the interpolation in https://github.com/zcxu-eric/Ego4d_TalkNet_ASD/blob/ab9f345efc49fd70ed163c6cca674c3aff88e2b6/dataLoader.py#L158. I think that there could be 2 ways to handle this: 1) have contiguous AV frames and do label interpolation, or 2) have discontiguous AV frames and not do label interpolation. Do you expect either of these options to work better?P.S. This is similar to https://github.com/zcxu-eric/Ego4d_TalkNet_ASD/issues/1#issue-1306619210.
Thanks, Sagnik