Open exalate-issue-sync[bot] opened 1 year ago
Megan Kurka commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] [~accountid:5b153fb1b0d76456f36daced]
JIRA Issue Migration Info
Jira Issue: PUBDEV-7488 Assignee: Sebastien Poirier Reporter: Megan Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Notes from our disucssion:
Due to the time-constrained nature of AutoML (meaning, that our goal is to get the best model within a fixed time – vs unlimited time), the current best way to use Target Encoding with AutoML is:
split data into four parts using split frame (70/10/10/10): {{train}}, {{valid}}, {{blend}}, {{test}}
generate a TE model on train
apply TE model to train, valid, blend and test to get extended frames: {{train_te}},{{valid_te}}, {{blend_te}}, {{test_te}}
run vanilla automl with {{training_frame = train}}, {{validation_frame = valid}}, {{blending_frame = blend}} and {{leaderboard_frame = test}}. Also make sure to set {{nfolds = 0}} to turn off CV. look at leaderboard metrics
run TE automl with {{training_frame = train_te}}, {{validation_frame = valid_te}}, {{blending_frame = blend_te}} and {{leaderboard_frame = test_te}}. Also make sure to set {{nfolds = 0}} to turn off CV. look at leaderboard metrics
compare lb metrics of vanilla with TE AutoML, and hopefully we see better metrics for the latter!
If time and compute resrouces is not an issue, the safest way to do Target Encoding is within Nested CV (not currently supported in AutoML):