Closed lujiaying closed 1 year ago
Project Deliverables
Project Status
Aug - Sep Plan
I'd suggest the following things for next week
Quality Level
or Level Requirement
predictionimage
Image
, we want to have relative path like dev_images/bul05.png
Villagers_Gender_dev
. We want a uniform directory name "dev_images" so scripts can be easily scaled.info.txt
is a great idea. We indeed need it to refer to which col is label. I'd recommend also adding label column: Gender
for user. importance stddev p_value n p99_high p99_low
Personality 0.409677 0.008834 2.593102e-08 5 0.427867 0.391488
Hobby 0.035484 0.021030 9.777106e-03 5 0.078784 -0.007817
Species 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Subtype 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Birthday 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Catchphrase 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Favorite Song 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Favorite Saying 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Style 1 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Style 2 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Color 1 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Color 2 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Default Umbrella 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Wallpaper 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
Flooring 0.000000 0.000000 5.000000e-01 5 0.000000 0.000000
dev_images/crd01.png.png
exec.py
exp_save_dir/exp_results.csv
,dutils/validate_dataset.py
No_Mechanics
shows in several rows? Are they filled by us?Image Path
[INFO] Input arguments: Namespace(dataset_dir='datasets/Hearthstone-All/rarity', id_col='name', label_cols=['rarity'])
Following rows from dev EXIST in train!!
*** Please remove the overlaps ***
cardClass health name set type attack ... mechanics_2 race durability element description Image Path
0 NEUTRAL 4.0 Sub Scrubber BATTLEGROUNDS MINION 4.0 ... NaN MECHANICAL NaN NaN NaN dev_images/BG22_HERO_200_Buddy.jpg
2 HUNTER NaN Dragonslayer's Greatbow YEAR_OF_THE_DRAGON WEAPON 6.0 ... ['IMMUNE'] NaN 2.0 NaN NaN dev_images/DRGA_BOSS_22t4.jpg
5 SHAMAN 2.0 Glugg's Tail THE_SUNKEN_CITY MINION 2.0 ... NaN BEAST NaN NaN NaN dev_images/TSC_639t3.jpg
7 PALADIN NaN Hand of Salvation BATTLEGROUNDS SPELL NaN ... NaN NaN NaN HOLY NaN dev_images/TB_Bacon_Secrets_11.jpg
11 WARLOCK 3.0 Felstalker LEGACY MINION 4.0 ... NaN DEMON NaN NaN NaN dev_images/EX1_306.jpg
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
515 NEUTRAL 8.0 Alexstrasza VANILLA MINION 8.0 ... NaN DRAGON NaN NaN NaN dev_images/VAN_EX1_561.jpg
520 NEUTRAL 69.0 Ragnaros LETTUCE MINION 9.0 ... NaN ELEMENTAL NaN NaN NaN dev_images/LETL_028H_01.jpg
521 NEUTRAL 5.0 Elise Starseeker LOE MINION 3.0 ... NaN None_Race NaN NaN NaN dev_images/LOE_079.jpg
528 NEUTRAL 5.0 Queen Azshara THE_SUNKEN_CITY MINION 5.0 ... NaN NAGA NaN NaN NaN dev_images/TSC_641.jpg
533 WARRIOR 5.0 Darius Crowley CORE MINION 4.0 ... NaN None_Race NaN NaN NaN dev_images/CORE_GIL_547.jpg
[210 rows x 16 columns]
- [x] Currently, we may not need to upload trained artifacts into the cloud folder. Or If we upload, it would be great to use `exec.py` to save exp_arguments and exp_results.
- [x] Discuss whether we set it as a multi-column prediction or just multiple tasks (no dependency among different tasks)
- [x] Create a Turing server account for Yongchen, because we want to have a fair comparison between different baselines (same CPU cores, same GPU core). @lujiaying need to discuss this with Dr. Yang.
## own algorithm idea
auto-gnn: automatically construct a graph by categorical feature. Then it would be a heterogeneous graph with different types of nodes and types of edges.
TODOs:
Wrap up Pokemon and Hearthstone Datasets (every task is a unique dataset)
https://github.com/lujiaying/MUG-Bench/blob/master/run_dutils.sh
id_col
(this one is new), eval_metric
(new to add, for binary we use auc
, for multiclass we use log_loss
, refer to #5 for details ), label_col
, and the label distribution
(I believe these two are already finished)id
column (just use their id from original source), add a artist
columnHealth
), how about we split them into 0, 1, 2, 3, ..., 10, 11~20, >20
? A similar re-category can be done for Attack
, Cost
predictions. New game data to be included
Set up Hopper Server Running scripts
During my internship, I have the chance to be involved in the automl multimodal classification setting, where the modality includes tabular, text, and image. There already exist many text--image datasets (images with captions, visual question answering), and text--tabular datasets (Multimodal AutoML on Structured Tables with Text Fields). However, there is no comprehensive publically available table, text, and image datasets, which is quite unique and challenging itself.
After some discussion with Wenjing, we find out games like HearthStone, Magic, DOTA, LoL are ideal resources for this multimodal classification testbeds.
For instance, below are several cards from HeathStone. They contain tabular features (cost, attack, HP, card type, minion type, etc.); image features; textual features (descriptions of its effect, or background story). It is relatively easy to convert the data into a classification task (e.g. we can predict the cost of a card by all other features).
I was wondering if you are interested in a resource paper, which I think might be a low-hanging fruit to achieve. If multimodal classification is not very interested in our lab direction. there might be a chance that we can convert it to a graph problem or even multimodal KG/KB (I would recommend after releasing the resource paper and using some existing multimodal automl models to set up baselines).
The risk I can name now is mainly the copyright risk. But I do find some Common-Creative copyright resources we can use. I would like to invite my wife as one of the co-authors for the resource paper because she contributes a lot to shape this idea. Moreover, the beginning stage looks relatively easy, so if it is possible I may want to hire some undergraduates to help.