rust-num / num-traits

Numeric traits for generic mathematics in Rust
Apache License 2.0
732 stars 135 forks source link

`from_str_radix` panics when the first char is encoded with multiple bytes #125

Closed HeroicKatora closed 5 years ago

HeroicKatora commented 5 years ago

play.

fn main() {
    let _ = f32::from_str_radix("™0.2", 10);
}

Expected behaviour

Nothing hapens. The method returns an error with kind Invalid which is ignored.

Observed behaviour

The code panics.

thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '™' (bytes 0..3) of `™0.2`', src/libcore/str/mod.rs:2034:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Cause analysis

The underlying cause is that slice_shift_char splits the string at a hardcode byte offset of 1 instead of the byte-length of the first character. This panics when it is not a UTF-8 byte boundary.

https://github.com/rust-num/num-traits/blob/58f02a8677c13ad5d46d0eb0698d9052ca70ef73/src/lib.rs#L218-L220

HeroicKatora commented 5 years ago

Possible fixed implementation:

fn slice_shift_char(src: &str) -> Option<(char, &str)> {
     let mut chars = src.chars();
     if let Some(ch) = chars.next() {
         Some((ch, chars.as_str())) // available since Rust 1.4.0
     } else {
         None
     }
 }
cuviper commented 5 years ago

Thanks for the report, and your suggested fix looks great! Can you submit this as a pull request along with corresponding tests? It looks like slice_shift_char is used both at the beginning of the string and in the exponent position (for input like 1e100).