rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.15k stars 12.69k forks source link

str Debug write_str panic #124253

Closed baoyachi closed 6 months ago

baoyachi commented 6 months ago

I tried this code:

fn main() {
    let output = vec![ 73, 83, 126, 49, 126, 50, 48, 50, 51, 45, 48, 53, 45, 49, 56, 32, 48, 49, 58, 49, 48, 58, 52, 53, 126, 49, 48, 46, 49, 49, 49, 46, 53, 46, 49, 57, 57, 58, 51, 54, 52, 52, 56, 126, 49, 48, 46, 49, 49, 49, 46, 54, 46, 53, 50, 58, 49, 53, 50, 51, 126, 185, 230, 212, 242, 184, 230, 190, 175, 126, 196];
    let value = unsafe { std::str::from_utf8_unchecked(&output) };
    println!("{}",value);
    format!("{:#?}", value);
}

I expected to see this happen: cargo run ok.

Instead, this happened: cargo run failed.

➜  cargo r
    Finished dev [unoptimized + debuginfo] target(s) in 0.26s
     Running `target/debug/demo`
IS~1~2023-05-18 01:10:45~10.111.5.199:36448~10.111.6.52:1523~����澯~�
thread 'main' panicked at library/core/src/fmt/mod.rs:2350:34:
byte index 67 is not a char boundary; it is inside '澯' (bytes 66..69) of `IS~1~2023-05-18 01:10:45~10.111.5.199:36448~10.111.6.52:1523~����澯~�`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
➜  

Meta

rustc --version --verbose:

➜  rustc --version --verbose
rustc 1.77.2 (25ef9e3d8 2024-04-09)
binary: rustc
commit-hash: 25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04
commit-date: 2024-04-09
host: aarch64-apple-darwin
release: 1.77.2
LLVM version: 17.0.6

The source code address that caused this occurrence:https://doc.rust-lang.org/src/core/fmt/mod.rs.html#2349-2350

#[stable(feature = "rust1", since = "1.0.0")]
impl Debug for str {
    fn fmt(&self, f: &mut Formatter<'_>) -> Result {
        f.write_char('"')?;
        let mut from = 0;
        for (i, c) in self.char_indices() {
            let esc = c.escape_debug_ext(EscapeDebugExtArgs {
                escape_grapheme_extended: true,
                escape_single_quote: false,
                escape_double_quote: true,
            });
            // If char needs escaping, flush backlog so far and write, else skip
            if esc.len() != 1 {
+                f.write_str(&self[from..i])?;
                for c in esc {
                    f.write_char(c)?;
                }
                from = i + c.len_utf8();
            }
        }
        f.write_str(&self[from..])?;
        f.write_char('"')
    }
}

The input str is not a valid utf8-encoded string, causing &self[from..i] to directly violently retrieve the array subscript and panic.

The error message that caused this issue is: byte index 67 is not a char boundary; It is inside '澯' (bytes 66.. 69) of IS~1-2023-05-18 01:10:45~10.111.5.199:36448~10.111.6.52:1523~澯~

y21 commented 6 months ago

The input str is not a valid utf8-encoded string

Valid UTF-8 is an invariant of the str type and line 3 in that reproducer violates the safety requirement of std::str::from_utf8_unchecked.

Code is allowed to make the assumption that str always contains valid UTF-8, so the Debug implementation is not at fault IMHO. This is a bug in your code. See also: https://doc.rust-lang.org/stable/std/primitive.str.html#invariant

baoyachi commented 6 months ago

@y21 Thx.