lujiaying / MUG-Bench

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
https://aclanthology.org/2023.findings-emnlp.354/
Other
8 stars 0 forks source link

Using tabular data for text extraction and vice versa. #11

Open nashapir opened 6 months ago

nashapir commented 6 months ago
[INFO] Text raw input collected from columns: ['name', 'generation', 'status', 'species', 'abilities_number', 'ability_1', 'ability_2', 'ability_hidden', 'growth_rate', 'percentage_male', 'Image Path']

Hello,

I'm looking at the log from the Pokemon Primary type example and noticing that the Mugnet is collecting text features from 5 tabular columns as well as the image path (generation, status, abilities_number, growth_rate, percentage_male, Image Path). Conversely, it's collecting collecting tabular features from columns that are meant (per Appendix B.1.) to be text features.

It looks like the determination, at least for text features, happens here: https://github.com/lujiaying/MUG-Bench/blob/master/baselines/MuGNet/exec.py#L143. It appears to accept any columns that is either textual or categorical.

Am I understanding this column reuse correctly, and if so is there a reason for it (versus only using columns that correspond to a certain modality)?

Thanks!

lujiaying commented 6 months ago

Hi @nashapir,

we intentionally let the multimodal classifier have broad input features. For instance, text feature extractor/classifier can naturally accommodate certain tabular features (categorical for instance), and this approach delivers better performance.