Using tabular data for text extraction and vice versa.

[INFO] Text raw input collected from columns: ['name', 'generation', 'status', 'species', 'abilities_number', 'ability_1', 'ability_2', 'ability_hidden', 'growth_rate', 'percentage_male', 'Image Path']

Hello,

I'm looking at the log from the Pokemon Primary type example and noticing that the Mugnet is collecting text features from 5 tabular columns as well as the image path (generation, status, abilities_number, growth_rate, percentage_male, Image Path). Conversely, it's collecting collecting tabular features from columns that are meant (per Appendix B.1.) to be text features.

It looks like the determination, at least for text features, happens here: https://github.com/lujiaying/MUG-Bench/blob/master/baselines/MuGNet/exec.py#L143. It appears to accept any columns that is either textual or categorical.

Am I understanding this column reuse correctly, and if so is there a reason for it (versus only using columns that correspond to a certain modality)?

Thanks!

lujiaying / MUG-Bench

Using tabular data for text extraction and vice versa. #11