ufal / MLASK

EACL 2023 paper "MLASK: Multimodal Summarization of Video-based News Articles"
https://aclanthology.org/2023.findings-eacl.67
Apache License 2.0
10 stars 0 forks source link

Frame encoder #7

Open allent4n opened 7 months ago

allent4n commented 7 months ago

Hi Ufal,

Thanks for sharing such an invaluable Repo, I was wondering how I can get the encoded frame values for the model input, since you mentioned that you are using EfficientNet (Tan and Le, 2019) and VisionTransformer (Dosovitskiy et al., 2021) as features extractors.

Thank you

mateuk commented 7 months ago

You can get the raw frames/videos from here and the notebooks in MLASK/feature_extraction give you a practical example on how to extract features from jpg images and mp4 videos. We are not sharing the pre-computed features.

allent4n commented 7 months ago

Thank you Mateuk for your prompt response. In terms of the code, it keeps outputting the error of KeyError: 'src_img_cosine' after I run it. After researching, I found that the main reason is because of the batch["src_img_cosine"]. The reason may be that the code is not initializing this argument, can you please take a look at that issue? thank you so muck for your kind assistance.