Open isabella opened 2 years ago
training output on current release:
✅ Inferring train table columns. 0ms
✅ Loading train table. 0ms
✅ Shuffling. 0ms
warning: The train dataset is very small. It has only 7 row(s).
warning: The comparison dataset is very small. It has only 1 row(s).
warning: The test dataset is very small. It has only 2 row(s).
✅ Computing train stats. 0ms
✅ Computing test stats. 0ms
✅ Finalizing stats. 0ms
✅ Computing baseline metrics. 0ms
info: Press ctrl-c to stop early and save the best model trained so far.
✅ Computing features. 0ms
✅ Training model 1 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 1 AUC ROC: NaN
✅ Computing features. 0ms
✅ Training model 2 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 2 AUC ROC: NaN
✅ Computing features. 0ms
✅ Training model 3 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 3 AUC ROC: NaN
✅ Computing features. 0ms
✅ Training model 4 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0s 7ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 4 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 5 of 8: Tree. 0ms
✅ Training model 5 of 8. 0s 88ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 5 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 6 of 8: Tree. 0ms
✅ Training model 6 of 8. 0s 95ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 6 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 7 of 8: Tree. 0ms
✅ Training model 7 of 8. 0s 107ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 7 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 8 of 8: Tree. 0ms
✅ Training model 8 of 8. 0s 127ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 8 AUC ROC: NaN
error: panicked at 'called `Option::unwrap()` on a `None` value', crates/core/train.rs:1881:22
0: backtrace::backtrace::trace
1: backtrace::capture::Backtrace::new
2: tangram::train::train::{{closure}}
3: std::panicking::rust_panic_with_hook
4: std::panicking::begin_panic_handler::{{closure}}
5: std::sys_common::backtrace::__rust_end_short_backtrace
6: _rust_begin_unwind
7: core::panicking::panic_fmt
8: core::panicking::panic
9: tangram::train::train::{{closure}}
10: tangram::main
11: std::sys_common::backtrace::__rust_begin_short_backtrace
12: _main
csv:
is_fraud,account.state,account.credit_score,account.account_age_days,account.has_2fa_installed,transaction_stats.transaction_count_7d,transaction_stats.transaction_count_30d
Positive,Arizona,685,1547,0,9,41
Negative,Hawaii,625,861,1,11,36
Negative,Arkansas,730,958,0,0,16
Positive,Louisiana,610,1570,0,12,26
Negative,South Dakota,635,1953,0,7,30
Negative,Louisiana,710,32,0,8,22
Positive,New Mexico,645,37,1,5,40
Negative,Nevada,735,1627,0,12,51
Negative,Kentucky,650,88,1,11,23
Negative,Delaware,680,1687,0,2,39
The warning indicates there is only a single row for comparison. This is not enough. we need to enforce a reasonable minimum training dataset size. A valid AUC only exists if there is at least one example whose true value is positive and one example whose true value is negative, otherwise one of the TPR or FPR used to compute the AUC will be NaN.
When training a binary classification model, the CLI crashes with the following output:
https://github.com/tangramdotdev/tangram/blob/2e51ef1ae3c7ec1e65b9232945d5cfb6d99d52ef/crates/core/train.rs#L1882 We handled this in the regression case here https://github.com/tangramdotdev/tangram/blob/2e51ef1ae3c7ec1e65b9232945d5cfb6d99d52ef/crates/core/train.rs#L1857 by removing the
unwrap
and outputtting a friendly error message. We need to do the same for binary classification and multiclass classification.