Code and data for the EACL 2023 (findings) paper: "MLASK: Multimodal Summarization of Video-based News Articles".
The MLASK corpus consists of 41,243 multi-modal documents – video-based news articles in the Czech language – collected from Novinky.cz and Seznam Zprávy.
Each document consists of:
[Update 07.11.2023 ]
The dataset is available here.
We include the code used in our experiments. It is structured as follows:
├── feature_extraction
│ ├── extract_image_features.ipynb - Image feature extraction (Section 4.2)
│ └── extract_video_features.ipynb - Video feature extraction (Section 4.2)
└── src
├── model
│ ├── mms_modeling_t5.py - Modified version of the mT5 model, that includes video encoder, image encoder etc (Section 4)
│ └── model_mms.py - Implementation of training loop, evaluation metrics and logging
├── data
│ ├── data_laoder.py - Implementation of data loader/data pre-processing
│ └── utils.py - Utility functions
└── runtime
├── test_mms_model.py - MMS model evaluation (Section 5.2 and 5.3)
└── train_mms_model.py - MMS model training (Section 5.2 and 5.3)
RougeRaw.py
required by model_mms.py
can be downloaded from the SumeCzech repository.
Code was tested with Python 3.8
, NVIDIA RTX 3090
and versions from requirements.txt
.
Our code is released under Apache License 2.0, unless stated otherwise.
If you find our code or data useful, please cite:
@inproceedings{krubinski-pecina-2023-mlask,
title = "{MLASK}: Multimodal Summarization of Video-based News Articles",
author = "Krubi{\'n}ski, Mateusz and Pecina, Pavel",
booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-eacl.67",
pages = "880--894",
}