Closed lnicola closed 2 years ago
At a first look, I don't see how the sender
can be None
on drop
, but I'm probably missing something. Anyway, this happens on one dataset I have, if I specify both --file-train
and --file-test
and a config file.
And it is not happening when you just pass a --file-train
and --file-test
without a config file?
Yeah, but then it trains the wrong thing (as per my previous question from today). I suppose there's something wrong with my config file. I have the same classes in both the training and validation file, at least.
It's just bizarre that its a SIGILL. What is the version of tangram that you are using?
Both the latest release and a git build fail in the same way. SIGILL is an abort, because a thread panicked while panicking (I updated my original comment).
EDIT: I removed my comment below. The test command line was tangram train --file-train training_ss.csv --file-test validation_ss.csv -t CTnumL4A -o t.tangram --config config.json
.
I meant the format of your config file shouldnt cause it. We can try and debug this over a video call. Can you join our discord https://discord.gg/fqyvVMsJ
No problem, I am able to reproduce this on my end. I'll start digging in and let you know what I find.
There is definitely an issue with the progress bar. I'm looking into this more. in the meantime, you can train a model by passing the flag --no-progress
Hi @lnicola I fixed the issue. The problem was that we were using the train_row_count
as the total for the progress bar when it was in fact the test_row_count
that was the total which caused a value that we assumed to be positive to be negative. The eta was negative and the following line caused the panic https://github.com/tangramdotdev/tangram/blob/47340b8de905399912dfb4d181e9a45025c403c8/crates/cli/train.rs#L605
The issue is fixed on the main branch. The same bug should have been hit with regression but because progress draws on a timer and the regression code path was faster, the progress bar didn't get a chance to draw and so that code path was not hit.
Thanks, it's working now. I just had to make a small change for it to build:
diff --git i/crates/cli/train.rs w/crates/cli/train.rs
index 874d77c..cc7836b 100644
--- i/crates/cli/train.rs
+++ w/crates/cli/train.rs
@@ -97,8 +97,7 @@ pub fn train(args: TrainArgs) -> Result<()> {
}
};
let kill_chip = unsafe { ctrl_c::register_ctrl_c_handler()? };
- let train_grid_item_outputs =
- trainer.train_grid(Some(kill_chip), &mut handle_progress_event)?;
+ let train_grid_item_outputs = trainer.train_grid(kill_chip, &mut handle_progress_event)?;
unsafe { ctrl_c::unregister_ctrl_c_handler()? };
if kill_chip.is_activated() {
if let Some(progress_thread) = progress_thread.as_mut() {
yes, my bad! Thank you :)
I commented out the stuff in
drop
and got: