Closed lujiaying closed 1 year ago
I have made three CSV files for train, dev and test of the species in Animal Crossing. Species_dev.csv Species_test.csv Species_train.csv
From the label distribution, I'd suspect species prediction would be very challenging. Would you like to do a pilot experiment using AutoGluon to see the performance? If AutoGluon ends up discarding some rare class labels (would show some log information if that happens), we may need to stick on gender prediction task instead of species prediction.
We can start with CPU-only models. Some thing like below:
from autogluon.tabular import TabularDataset, TabularPredictor
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
predictor = TabularPredictor(label='class').fit(train_data=train_data)
predictions = predictor.predict(test_data)
Team_dev.csv Team_test.csv Team_train.csv I have only included the first 4 abilities each hero have since there are some heroes with extra abilities. There are 63 heroes with "good" team attributes and 54 heroes with "bad" team attributes
So far all the csv files look great to me. Since we are going to create a benchmark which contains a bunch of datasets. I'd suggest we use Emory OneDrive to organize files: cloud folder link
The file structure can be:
./datasets
|-- AnimalCrossing_Gender
|-- train.csv
|-- dev.csv
|-- test.csv
|-- train_images.zip
|-- dev_images.zip
|-- test_images.zip
|-- Dota2_Team
|-- train.csv
|-- dev.csv
|-- test.csv
|-- train_images.zip
|-- dev_images.zip
|-- test_images.zip
Special notes for *_images.zip
: 1. *_images.zip is a compressed archive of a folder; 2. please make sure train.csv
has a column image
that stores correct relative path to the image file.
train_images.zip
|-- image_1
|-- image_2
|-- ....
@qyccc3 due to the fact that Dota2 only has 117 heros, let's focus on generating a polished dataset of Animal Crossing for this week (Aug30). Please try to fill the table on the first comment in this thread(issue)
Heartstone_minion.csv Heartstone_spell.csv These are the Heartstone minion and spell CSV without filling the empty cells.
I have uploaded Hearthstone_Minion to OneDrive with its images, CSV files, info.txt and trained predictor
Heartstone_minion.csv Heartstone_spell.csv These are the Heartstone minion and spell CSV without filling the empty cells.
Missing columns of Minion
text
: manually checked Fearsome Doomguard, Firecat Form, Snowflipper Penguin, they are indeed with blank description.race
: we can fill out null as None_Race
text
and mechanics
, what do AG thinks these columns are? Text or category?Missing columns of Spell
rarity
: according to this url, it is highly possible its rarity is free. Checked Nerubian Ambush!, Improved Ice Trap, Shadowy Gem, DIE, INSECT!...blank
(keep cell empty) instead of 0 for columns health
and attack
For Heartstone, let's have the following types included: Minion, Spell, Weapon, Location
For Pokemon, let's remove the following columns: egg_type_number, egg_type1, egg_type2, type_number against_normal, againstfire, ..., against*
Because these columns directly leak information about pokemon's type
Everything has been uploaded to overleaf.
Let's use this issue to store polished datasets. We expect 80%/5%/15% train/dev/test split. Cloud folder link
Dataset Stat
AnimalCrossing_SpeciesAnimalCrossing_GenderP.S. column definitions:
#sample (train/dev/test)
: num of rows/samples in train, dev, test set, e.g.800/ 50/ 150
#class
: num of classes to predict, e.g.2
for binary classification#feature
: num of feature per sample, e.g.10
as an example#cate_f
: num of categorical feature, e.g.3
#num_f
: num of numerical feature, e.g.4
#txt_f
: num of textual feature, e.g.2
#img_f
: num of image feature, e.g.1
class distribution
Can we also add some stat about the class distribution of the built dataset? e.g. train: {'male': 30, 'female': 40}
train:{male:171, female:160} dev:{male:11, female:9} test:{male:32, female:30}
Binary Task Exp Results
Multiclass Task Exp Results