modelfoxdotdev / modelfox

ModelFox makes it easy to train, deploy, and monitor machine learning models.
Other
1.46k stars 63 forks source link

Crash when provided train data and test data do not have same number of columns #64

Closed comunidadio closed 2 years ago

comunidadio commented 2 years ago

flight_delays dataset as per https://www.kaggle.com/c/flight-delays-spring-2018

On tangram 0.7:

 $ tangram train --file-train flight_delays_train.csv --file-test flight_delays_test.csv --target dep_delayed_15min
✅ Inferring train table columns. 0s 43ms
✅ Loading train table. 0s 40ms
✅ Loading test table. 0s 45ms
✅ Shuffling. 0s 9ms
✅ Computing train stats. 0s 121ms
✅ Computing test stats. 0s 112ms
✅ Finalizing stats. 0s 1ms
error: panicked at 'removal index (is 8) should be < len (is 8)', library/alloc/src/vec/mod.rs:1385:13
   0: backtrace::capture::Backtrace::new
   1: tangram::train::train::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: _rust_begin_unwind
   6: core::panicking::panic_fmt
   7: alloc::vec::Vec<T,A>::remove::assert_failed
   8: tangram_core::train::Trainer::prepare
   9: tangram::main
  10: std::sys_common::backtrace::__rust_begin_short_backtrace
  11: _main

Noticed slightly different stacktrace with tangram latest (main branch):

 $ tangram-tip train --file-train flight_delays_train.csv --file-test flight_delays_test.csv --target dep_delayed_15min
✅ Inferring train table columns. 0s 48ms
✅ Loading train table. 0s 41ms
✅ Loading test table. 0s 44ms
✅ Shuffling. 0s 9ms
✅ Computing train stats. 0s 115ms
✅ Computing test stats. 0s 119ms
✅ Finalizing stats. 0s 1ms
error: panicked at 'removal index (is 8) should be < len (is 8)', crates/core/train.rs:176:58
   0: backtrace::capture::Backtrace::new
   1: tangram::train::train::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: _rust_begin_unwind
   6: core::panicking::panic_fmt
   7: alloc::vec::Vec<T,A>::remove::assert_failed
   8: tangram_core::train::Trainer::prepare
   9: tangram::main
  10: std::sys_common::backtrace::__rust_begin_short_backtrace
  11: _main
comunidadio commented 2 years ago

Oops... didn't realize Kaggle's "test data" is without label column as it's for blind testing. I guess the issue here is more about showing an error message rather a crash stacktrace when columns differ between tangram's train and test sets?

nitsky commented 2 years ago

@comunidadio thanks for opening this issue. We should show a good error message in this case. We will have a fix pushed shortly.

deciduously commented 2 years ago

Good catch @comunidadio - this commit will now quit gracefully in this case.