Open Michael-F-Bryan opened 2 years ago
Thanks, @Michael-F-Bryan, for finding it out. I will correct it.
If it helps, here is what we log when the panic occurs:
{
"level": "ERROR",
"message": "panicked at 'Start index: 0 is greater than or equal to end index: 0', /Users/mohit/.cargo/git/checkouts/proc-blocks-49e4d24512f5d324/1e53d6f/text_extractor/src/lib.rs:44:13",
"target": "hotg_runicos_base_wasm",
"module_path": "hotg_runicos_base_wasm",
"file": "/Users/mohit/.cargo/registry/src/github.com-1ecc6299db9ec823/hotg-runicos-base-wasm-0.10.0/src/lib.rs",
"line": 52
}
I believe @Mohit0928 created #10 to resolve this, but we're running into the panics in the text_extractor
proc block, while that PR's changes were added to the tokenizer
proc block.
I think we need to rewrite these lines in a way that lets us handle zero, one, or multiple sentences as an output. Possibly using zip()
to iterate over pairs of start/end logits, constructing sentences until we get an invalid pair (e.g. start_index = end_index = 0
because there is no answer).
@Michael-F-Bryan, When I ran the Bert rune with sentence-1 empty. I'm getting below as error message.
mohit@MacBook-Air bert % rune run bert.rune --raw input1.txt input2.txt
[2021-11-09T15:00:50.288Z WARN hotg_rune_cli::run::command] TensorFlow Lite has a bug where its MacOS CPU backend will occasionally segfault during inference. See https://github.com/tensorflow/tensorflow/issues/52300 for more.
[2021-11-09T15:00:50.288Z ERROR hotg_runicos_base_wasm] panicked at 'Sentence 1 is empty', /Users/mohit/.cargo/git/checkouts/proc-blocks-49e4d24512f5d324/7c78b51/tokenizers/src/lib.rs:59:9
Error: Call failed
Caused by:
0: Unable to call the _call function
1: RuntimeError: The Rune encountered a fatal error
at <unnamed> (<module>[220]:0x230f7)
at <unnamed> (<module>[297]:0x24ab1)
at <unnamed> (<module>[209]:0x22ab3)
at <unnamed> (<module>[325]:0x25ac5)
at <unnamed> (<module>[322]:0x259ed)
at <unnamed> (<module>[122]:0xcecb)
at <unnamed> (<module>[23]:0x11d8)
at <unnamed> (<module>[26]:0x2835)
2: The Rune encountered a fatal error
3: panicked at 'Sentence 1 is empty', /Users/mohit/.cargo/git/checkouts/proc-blocks-49e4d24512f5d324/7c78b51/tokenizers/src/lib.rs:59:9
Aren't we supposed to get this? Or are we expecting something different?
No, that's not what we want to do.
The underlying problem is that the model was unable to generate an answer for the provided inputs so the text extractor crashed the Rune (the logits tensor passed to the text extractor was all zeroes). Your "sentence 1 is empty" assertion is in the tokenizer so it won't help prevent this situation.
Another thing to keep in mind is the text passed to a tokenizer is controlled by the user. in general, you should try to handle bad user input graciously (e.g. by setting the tokenizer output to all zeroes) so unexpected input won't crash the runtime. There's a pretty common saying for this:
Be liberal in what you accept, and conservative in what you send - the Robustness Principle
Also, assertions are mainly be used for unrecoverable errors that were most probably caused by a programming bug (e.g. non-finite floats due to a divide by zero). For errors you can expect to see frequently like empty input, we should try to return a sensible default value like an empty string.
I was playing around with the web runtime when I noticed passing an empty string to the Bert rune will trigger this assertion.
In this case, it looks like
end_index
andstart_index
are both zero.I'm guessing we'll also see this happen when the model is unable to figure out the answer to your question.