lujiaying / MUG-Bench

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
https://aclanthology.org/2023.findings-emnlp.354/
Other
8 stars 0 forks source link

Dataset Validation #4

Closed lujiaying closed 1 year ago

lujiaying commented 1 year ago

We've been making good progress so far. However, I find out manual validation of the dataset quality is tedious and hard to manage. This issue is to create a uniform validation for our datasets.

Criteria (More criterion is welcome!)

script for validation

dutils/validate_dataset.py

Usage

python dutils/validate_dataset.py \
    --dataset_dir datasets/pokemon_0421 \
    --id_col name \
    --label_cols type_1 type_2