tzhhhh123 / HC-STVG

The HC-STVG Dataset
53 stars 1 forks source link

Temporal inconsistency in the annotations #9

Closed crodriguezo closed 3 years ago

crodriguezo commented 3 years ago

Hi,

I want to get some clarifications in the annotations HCVG_train.json. I played with the annotations to double-check the bounding boxes and the temporal, and I found an inconsitency between time and frame. Let me explain with few samples.

Annotation

'video': '256_o4xQ-BEa3Ss.mp4', 
'st_time': 14.878489661372905, 
'ed_time': 17.89848966137243, 
'st_frame': 372, 
'ed_frame': 437, 
'img_num': 500
`fps`:25.0
`fps_ffmpeg`:25.0

If we compute the frame using the fps of the video and the time, we got the following values:

st_time * fps = st_frame =  371.96224153432263 
ed_time * fps = ed_frame = 447.46224153431075

Then, if we compute the frame rate for each point of the moment, we can see a considerable difference.

st_frame/st_time = 25.002 
ed_frame/ed_time = 24.415

Another example

Annotation

'video': '13_c9pEMjPT16M.mp4', 
'time_start': 15.888056338028187, 
'time_end': 19.908056338028167, 
'frame_start': 477, 
'frame_end': 588, 
'number_frames': 600, 
'fps':30.0
'fps'_ffmpeg: 29.97002997002997

Computation

st_time * fps = st_frame = 476.64169014084564
ed_time * fps = ed_frame =  597.241690140845

st_frame/st_time = 30.02256
ed_frame/ed_time = 29.53579

I can found that inconsistency over every video. I wonder if it is related to the spatial annotations. Do you have any recommendation on how to deal with this for the evaluation?

tzhhhh123 commented 3 years ago

It only needs to follow the corresponding start and end frames. The st_time and ed_time are marked manually, some frames may not contain the target person, which is slightly adjusted in the st_frame and ed_frame during the bounding boxes annotation.

crodriguezo commented 3 years ago

Thank you