salesforce / paprika

Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"
Apache License 2.0
46 stars 4 forks source link

How does the evaluation on downstream tasks carried out? #7

Open ee2110 opened 7 months ago

ee2110 commented 7 months ago

Hi, thank you for the great work and interesting ideas!

  1. Are the validation/test set from COIN & CrossTask datasets used during evaluation?
  2. Are the downstream models (MLP / Transformer) trained with COIN & CrossTask data before evaluation?
  3. During evaluation for task recognition, are all annotated video segments from a video fed into the pre-trained model e(.)? or only specific one segment from a video is used? I wondered how the accuracy was calculated.

Hope to get more information about these, I enjoyed reading your work.

Below is the screenshot of a diagram taken from the paper

Capture2

Thank you.

hongluzhou commented 7 months ago

Thank you for your interest in our work and for your kind words!

  1. We used train/test sets from COIN (https://github.com/salesforce/paprika/blob/cbefd714f3368733b1dc4dc3f2ee1e2ba69f57ed/datasets/coin.py#L32) and created train/test sets for CrossTask on our own using random splits (https://github.com/salesforce/paprika/blob/cbefd714f3368733b1dc4dc3f2ee1e2ba69f57ed/datasets/cross_task.py#L12).

  2. Yes, downstream models were trained on the train set of the downstream datasets before evaluating them on the downstream test set.

  3. We used the pre-trained model to extract features of the video segments that contain steps. These features served as the input to the downstream models (https://github.com/salesforce/paprika/blob/cbefd714f3368733b1dc4dc3f2ee1e2ba69f57ed/datasets/cross_task.py#L201).