rschmitt / heatseeker

A high-performance Selecta clone, written in Rust.
MIT License
214 stars 10 forks source link

Assumptions about input encodings #28

Closed riddochc closed 5 years ago

riddochc commented 6 years ago

As an example of where this is a problem, I can run a command like:

locate '.' | hs

which causes:

thread 'main' panicked at 'called Result::unwrap() on an Err value: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }', libcore/result.rs:945:5

This is the result of some directory or filename somewhere in my system not being encoded in UTF-8. There's a number of places this error could possibly come from, the one that stands out first is src/main.rs:283, which is:

stdin.read_line(&mut s).unwrap();

But there looks like there's probably a lot of potential places where input to heatseeker is assumed to be UTF-8. If this were a shell script, I'd be using something like the following:

iconv -f 'UTF8' -t "ASCII//TRANSLIT" -c < input_file > output_file

I think in heatseeker's case, transliterating would be a better way to handle user input than panicking. When I have the chance, I'll look around in rust's APIs and see if I can find the rust equivalent for this.

rschmitt commented 6 years ago

At a minimum, the program should not crash when given some non-UTF8 inputs, so I think that discarding such lines would be a reasonable behavior.