hbdat / eccv20_Multi_Task_Procedure_Learning

Self-Supervised Multi-Task Procedure Learning from Instructional Videos @ ECCV20
MIT License
6 stars 2 forks source link

key 'superframe_time' not in .mat files provided in the ProceL dataset #5

Open Sid2697 opened 3 years ago

Sid2697 commented 3 years ago

Hey!

The following part of the code requires the .mat file to have the key named superframe_timehttps://github.com/hbdat/eccv20_Multi_Task_Procedure_Learning/blob/7783e7b9b47498933d359adfd761579f09934c3c/core/FeatureVGGDataset.py#L145-L151 However, superframe_time is not in the annotations provided in ProceL. The list of available keys are 'caption', 'caption_frame', 'caption_time', 'framerate', 'grammar', 'key_steps_frame', 'key_steps_segment', 'key_steps_time', 'segment_frame', 'segment_time'.

Can you please shed some light on the possible reason for this issue?

Regards, Siddhant Bansal

hbdat commented 3 years ago

Hi @Sid2697 ,

Thanks again for spotting the issue. Somehow, the internal .mat file I used contained this field. Thus, for the ease of running, I have uploaded these annotation files under the following link: https://drive.google.com/file/d/1q1qVSg7pQcqFZvypcxk25TTaDA1BuAdJ/view?usp=sharing

Best, Dat

Sid2697 commented 3 years ago

Thanks a lot for sharing these annotations! I will try these one out and update you.

Sid2697 commented 3 years ago

Hey @hbdat,

I was trying to run the code using the updated annotations provided. However, I am facing the following issue.

The length of seg_list and mask_list generate from https://github.com/hbdat/eccv20_Multi_Task_Procedure_Learning/blob/7783e7b9b47498933d359adfd761579f09934c3c/core/FeatureVGGDataset.py#L375 is different. Upon further digging I found that this is due to the difference in number of frames for which the features are generated and, the number of frames for which the segmentation is provided in the annotation.

In short what I am trying to say is, this happens when the number of frames in the video is different from the number of segments provided in the annotations. For example, for video changing_tire_0003mp4.mp4 in category changing_tire upon checking with OpenCV (https://stackoverflow.com/questions/25359288/how-to-know-total-number-of-frame-in-a-file-with-cv2-in-python) has 3933 frames (due to this, the features generated has shape: (3932, 512, 7, 7)), whereas the annotations have segmentation till frame number 2788 (this is the case for both public and internal annotations). Due to this, length of seg_list while running the code is 2788 and length of mask_list is 3932. Hence the error.

Can you please have a look into it and let me know if there is an obvious mistake from my end or, it is actually an issue?

Regards, Siddhant Bansal

hbdat commented 3 years ago

Hi @Sid2697 ,

I am not sure what happen as the dataset, I used, follows the annotation from the mat file. One quick fix is to try to match the annotation between video and mat file not by their name but instead by the number of frame (file the entry in mat file with the same number of frame as the video). This is also a good sanity check too. Please let me know if this work.

Best, Dat

Sid2697 commented 3 years ago

Hey @hbdat,

I was working on the CrossTask dataset and, I was successfully able to generate the features using the code provided. However, as I proceeded to train the models, I noticed that the core module is missing files named alignment and alignment_Alyrac which are being imported at various places. For example: https://github.com/hbdat/eccv20_Multi_Task_Procedure_Learning/blob/a95f6661b5672916b57330e07577a31d3009ba00/experiments/all_cat/CrossTask/CrossTask_cat_batch_rank_key_all_cat_ss_att_summarization.py#L45 and https://github.com/hbdat/eccv20_Multi_Task_Procedure_Learning/blob/a95f6661b5672916b57330e07577a31d3009ba00/core/helper.py#L9-L10

Can you please provide the missing files?

Regards, Siddhant Bansal

hbdat commented 3 years ago

Thanks for reminding me about this. I have committed a fix to remove these obsolete modules/functions, which are not needed for running the code.

Best, Dat

huguyuehuhu commented 3 years ago

Hey @hbdat,

I was trying to run the code using the updated annotations provided. However, I am facing the following issue.

The length of seg_list and mask_list generate from

https://github.com/hbdat/eccv20_Multi_Task_Procedure_Learning/blob/7783e7b9b47498933d359adfd761579f09934c3c/core/FeatureVGGDataset.py#L375

is different. Upon further digging I found that this is due to the difference in number of frames for which the features are generated and, the number of frames for which the segmentation is provided in the annotation. In short what I am trying to say is, this happens when the number of frames in the video is different from the number of segments provided in the annotations. For example, for video changing_tire_0003mp4.mp4 in category changing_tire upon checking with OpenCV (https://stackoverflow.com/questions/25359288/how-to-know-total-number-of-frame-in-a-file-with-cv2-in-python) has 3933 frames (due to this, the features generated has shape: (3932, 512, 7, 7)), whereas the annotations have segmentation till frame number 2788 (this is the case for both public and internal annotations). Due to this, length of seg_list while running the code is 2788 and length of mask_list is 3932. Hence the error.

Can you please have a look into it and let me know if there is an obvious mistake from my end or, it is actually an issue?

Regards, Siddhant Bansal

Hi, @Sid2697 In Procel Dataset, have you ever math the frame numbers of the updated annotations and VGG generated feature? I find that output of the 'is_match' ( is_match = self.check_match_annotation(category,video,self.mat_data,feature) ) is always False. e.g, I even could not match the video "phone_battery 29" (VGG feature frame number 6742 ) with any video in the annotation mat file., ie. the simple code below is always False.

for video_no in range(1,100): 
    n_segments = len(mat_data[category]['superframe_time'][video_no][0])
    print(mat_data[category]['superframe_frame'][video_no][0][n_segments-1][1]==6742)

Thanks.