rust-lang / miri

An interpreter for Rust's mid-level intermediate representation
Apache License 2.0
4.37k stars 337 forks source link

The http crate's test test_invalid_name_lengths exposes some worst-case runtime behavior in Tree Borrows #2863

Closed saethlin closed 1 year ago

saethlin commented 1 year ago

This test: https://github.com/hyperium/http/blob/3a2f3e01c01e60b894d1f5b30554013d990010c6/src/header/name.rs#L1685-L1705

    #[test]
    fn test_invalid_name_lengths() {
        assert!(
            HeaderName::from_bytes(&[]).is_err(),
            "zero-length header name is an error",
        );

        let long = &ONE_TOO_LONG[0..super::super::MAX_HEADER_NAME_LEN];

        let long_str = std::str::from_utf8(long).unwrap();
        assert_eq!(HeaderName::from_static(long_str), long_str); // shouldn't panic!

        assert!(
            HeaderName::from_bytes(long).is_ok(),
            "max header name length is ok",
        );
        assert!(
            HeaderName::from_bytes(ONE_TOO_LONG).is_err(),
            "longer than max header name length is an error",
        );
    }

On my machine runs in ~40 seconds with Stacked Borrows, and with Tree Borrows runs for > 10 minutes. The fraction of runtime in keep_only_needed seems to grow over time without bound.

The offending loop is here: https://github.com/rust-lang/miri/blob/e602174a78c9b32253527ce87be1542d9c7bc1d9/src/borrow_tracker/tree_borrows/tree.rs#L533-L535 perf top identifies:

       │     <core::slice::iter::IterMut<T> as core::iter::traits::iterator::Iterator>::next:
       │       data16    data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
 13.02 │2f0:   add       $0x28,%rdx
 12.51 │       cmp       %rdx,%rcx
  0.00 │     ↓ je        30a
       │     miri::borrow_tracker::tree_borrows::unimap::UniValMap<V>::remove:
 35.48 │2f9:   cmp       %r12,0x10(%rax,%rdx,1)
  0.06 │     ↑ jbe       2f0
       │     self.data[idx.idx as usize] = None;
 14.27 │       mov       (%rax,%rdx,1),%rdi
 11.81 │       movb      $0x2,(%rdi,%rsi,1)
 12.68 │     ↑ jmp       2f0
RalfJung commented 1 year ago

Cc @Vanille-N

Vanille-N commented 1 year ago

That's quite an obscure bug. It took two+ hours, but I finally have a lead. It's not confirmed yet, but many things point towards this being the problem.

TL;DR: I believe this is an unfortunate interaction between RangeMap's iter_mut_all and UniMap's PartialEq::eq.

Problem 1: I already knew that PartialEq testing on UniMap was not optimal, because maps that are equivalent are not always PartialEq when using the #[derive()]d impl. Indeed [Some(0), None, None, None] != [Some(0)] although they are equivalent.

Problem 2: RangeMap's iter_mut_all does not merge identical adjacent cells, only iter_mut does.

Clues:

This seems to result in adjacent ranges being not equal pre-GC and thus not merged, and even if they become equal post-GC the merge still doesn't happen because iter_mut_all doesn't merge.

Note: it is very likely that serde1 and serde2 from the benches suffer from the same issue, and that in general most code that does a single pass of a large amount of bytes with a lot of per-byte work also exhibits this bad behavior.

Vanille-N commented 1 year ago

Yeah that seems to be it.

Replacing the implementation of RangeMap::iter_mut_all from

self.v.iter().map(|elem| (elem.range.clone(), &elem.data))

to

let start = self.v.first().map(|e| e.range.start).unwrap_or(0);
let end = self.v.last().map(|e| e.range.end).unwrap_or(0);
let len = end - start;
self.iter_mut(Size::from_bytes(start), Size::from_bytes(len))

makes the TB version terminate in 50s. I'll find a way to avoid doing too much useless work: iter_mut_all doesn't need to do a dichotomy to find the first element and it never has to split ranges.

Vanille-N commented 1 year ago

It's also not ideal that 10% of the execution time is spent in <[A] as core::slice::cmp::SlicePartialEq<B>>::equal...

saethlin commented 1 year ago

That is typical of pathological cases for Stacked Borrows. I remember considering doing tricks to implement the comparison as comparisons of &[u64] because currently LLVM either doesn't figure out that it can do that because there is no padding in the T you are comparing, or there is padding actually. But once I got the GC working, it shrank the size of the stacks so much that I decided making the comparisons faster wasn't worth pursuing.

I think there should be some clever way to avoid trying to merge adjacent elements when the comparison is known to be unlikely to succeed. Or maybe we could be sloppy and just only attempt merging once every few visits.

RalfJung commented 1 year ago

We already stop comparing after N successful merges (I forgot which value I used for N, I remember tweaking it a bit). Maybe we should also stop after N failed merges, or do something probabilistic to make sure 'late' chunks also get the chance to merge again.

RalfJung commented 1 year ago

This should be fixed by https://github.com/rust-lang/miri/pull/2865?

saethlin commented 1 year ago

Yup, looks fixed to me.