rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.91k stars 12.52k forks source link

rustc runs out of memory when compiling trait heavy code #43613

Closed Marwes closed 7 years ago

Marwes commented 7 years ago

Been working on an update to combine and ran into https://github.com/rust-lang/rust/issues/43132. Since the fix has been merged I updated using rustup but now rustc runs out of memory when compiling instead.

This branch reproduces the issue https://github.com/Marwes/combine/tree/ice_allocator_exhausted. I haven't bothered to reduce to a minimal production given that it is likely related to the slow down mentioned in #43132 and removing code will just make it compile and make the problem less pronounced (possibly the minimal example here reproduces the same problem as well https://github.com/rust-lang/rust/pull/32062/files#diff-6a2bc45424fb5e8e0f7eb99f2e05f9cf).

$ rustc --version --verbose
rustc 1.21.0-nightly (dd53dd5f9 2017-08-01)
binary: rustc
commit-hash: dd53dd5f9e21dce1fbc06b7f9f451d1009bdcfd8
commit-date: 2017-08-01
host: x86_64-pc-windows-msvc
release: 1.21.0-nightly
LLVM version: 4.0
Marwes commented 7 years ago

EDIT I may have been mistaken about the example from #32062. While compilation takes far longer than one would expect, that has likely always been the case as well (still a potential issue but perhaps not a reproduction for the one I encountered).

The example added for #32062 reproduces as long as few more calls to chain gets added

// Copyright 2016 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

// pretty-expanded FIXME #23616

fn main() {
    let _ = test(Some(0).into_iter());
}

trait Parser {
    type Input: Iterator;
    type Output;
    fn parse(self, input: Self::Input) -> Result<(Self::Output, Self::Input), ()>;
    fn chain<P>(self, p: P) -> Chain<Self, P> where Self: Sized {
        Chain(self, p)
    }
}

struct Token<T>(T::Item) where T: Iterator;

impl<T> Parser for Token<T> where T: Iterator {
    type Input = T;
    type Output = T::Item;
    fn parse(self, _input: Self::Input) -> Result<(Self::Output, Self::Input), ()> {
        Err(())
    }
}

struct Chain<L, R>(L, R);

impl<L, R> Parser for Chain<L, R> where L: Parser, R: Parser<Input = L::Input> {
    type Input = L::Input;
    type Output = (L::Output, R::Output);
    fn parse(self, _input: Self::Input) -> Result<(Self::Output, Self::Input), ()> {
        Err(())
    }
}

fn test<I>(i: I) -> Result<((), I), ()> where I: Iterator<Item = i32> {
    Chain(Token(0), Token(1))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .chain(Chain(Token(0), Token(1)))
        .parse(i)
        .map(|(_, i)| ((), i))
}
alexcrichton commented 7 years ago

@Marwes did this compile previously? Just wondering if we can tag it as a regression!

aturon commented 7 years ago

cc @arielb1 for potential regression tracking

Marwes commented 7 years ago

@alexcrichton I am afraid the the example from #32062 would fail to compile before as well so it is likely unrelated. https://github.com/Marwes/combine/tree/ice_allocator_exhausted did not compile before due to #43132 so this is probably not a regression.

jonhoo commented 7 years ago

A tangentially related comment: on nightly, no error is printed when the compiler fails this way beyond Could not compile "foo".. On beta the situation is a little better, as it at least prints fatal runtime error: allocator memory exhausted.

alexcrichton commented 7 years ago

Ah unfortunately this is a beta regression. Compiling this source I get:

$ rustc -V
rustc 1.19.0 (0ade33941 2017-07-17)
$ time rustc foo.rs
rustc foo.rs  3.00s user 0.02s system 99% cpu 3.040 total
$ rustc +beta -V
rustc 1.20.0-beta.2 (54279dfac 2017-08-13)
$ time rustc +beta foo.rs
rustc +beta foo.rs  6.17s user 1.26s system 99% cpu 7.437 total
alexcrichton commented 7 years ago

cc @rust-lang/compiler for a new beta regression

nikomatsakis commented 7 years ago

triage: P-high

arielb1 commented 7 years ago

@acrichto

We backed out quite a few PRs for beta .3. Could you test again after that?

alexcrichton commented 7 years ago

Looks like it helped a little but not a huge amount, the beta timing decreased from ~7 to ~5s

It looks like, however, that it may be constant overhead. Commenting out a few more chain calls and stable/beta don't diverge in the difference between them (1-2s).

Should this be closed?

nikomatsakis commented 7 years ago

I am reclassifying as a regression against nightly. Still relevant.

alexcrichton commented 7 years ago

Ok, switching milestone in that case as well.

Thanks!

alexcrichton commented 7 years ago

1.21 is now in beta

nikomatsakis commented 7 years ago

We should retest -- @arielb1, have your various improvements landed? I'm assigning to us both to try and investigate.

nikomatsakis commented 7 years ago

OK, so, using nightly, I am able to build the original combine branch, but there is definitely still a loss on beta/nightly relative to stable, both in memory use and time. I'll do a bit more profiling and see what turns up.

lunch-box. for x in stable beta nightly; do echo === $x ===; rustup update $x; /usr/bin/time rustc +$x foo.rs; done
=== stable ===
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.20.0 (f3d6973f4 2017-08-27)

5.97user 0.04system 0:06.02elapsed 100%CPU (0avgtext+0avgdata 121132maxresident)k
0inputs+7680outputs (0major+12912minor)pagefaults 0swaps
=== beta ===
info: syncing channel updates for 'beta-x86_64-unknown-linux-gnu'

  beta-x86_64-unknown-linux-gnu unchanged - rustc 1.21.0-beta.1 (198109911 2017-08-29)

9.99user 1.50system 0:11.49elapsed 100%CPU (0avgtext+0avgdata 387528maxresident)k
0inputs+8056outputs (0major+1408205minor)pagefaults 0swaps
=== nightly ===
info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.21.0-nightly (97b01abf3 2017-08-31)

10.16user 1.58system 0:11.74elapsed 100%CPU (0avgtext+0avgdata 368428maxresident)k
0inputs+8008outputs (0major+1709226minor)pagefaults 0swaps
nikomatsakis commented 7 years ago

OK, I did some digging. There is a massive spike (when compiling foo.rs) directly attributable to the projection cache. I have to assume this is due to us now including the obligations in the cache. Will investigate some more, but I suspect one of the improvements that @arielb1 and I have talked about (like skipping the obligations if there are no type variables in the result) might be relevant here.

nikomatsakis commented 7 years ago

Relevant backtrace from memcache report:

99.50% (286,155,244B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->57.95% (166,652,149B) 0x5352FDF: __rdl_alloc (lib.rs:131)
| ->51.10% (146,960,384B) 0x92C8978: _$LT$alloc..raw_vec..RawVec$LT$T$C$$u20$A$GT$$GT$::allocate_in::h79ee2a1009556b6d (heap.rs:84)
| | ->51.10% (146,960,384B) 0x8DCB78E: _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$::clone::h9c3d151fd71233a3 (raw_vec.rs:141)
| | | ->40.89% (117,600,896B) 0x92607AC: rustc::traits::project::ProjectionCache::insert_ty::h0cf47cfd76f17eb9 (project.rs:340)
| | | | ->40.89% (117,600,896B) 0x925B678: rustc::traits::project::opt_normalize_projection_type::h1383880954193dab (project.rs:530)
| | | |   ->40.89% (117,600,896B) 0x925A277: rustc::traits::project::normalize_projection_type::hf37038bf35e92756 (project.rs:365)
| | | |   | ->40.89% (117,600,896B) 0x9259FF6: _$LT$rustc..traits..project..AssociatedTypeNormalizer$LT$$u27$a$C$$u20$$u27$b$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$u20$as$u20$rustc..ty..fold..TypeFolder$LT$$
| | | |   |   ->30.69% (88,251,264B) 0x923A220: rustc::ty::structural_impls::_$LT$impl$u20$rustc..ty..fold..TypeFoldable$LT$$u27$tcx$GT$$u20$for$u20$$RF$$u27$tcx$u20$rustc..ty..TyS$LT$$u27$tcx$GT$$GT$::fo
| | | |   |   | ->30.69% (88,251,136B) 0x8F41703: core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$$u27$a$u20$mut$u20$F$GT$::call_once::h6574094d91ee375f (str
| | | |   |   | | ->30.69% (88,251,136B) 0x8E9412E: _$LT$rustc_data_structures..array_vec..ArrayVec$LT$A$GT$$u20$as$u20$core..iter..traits..Extend$LT$$LT$A$u20$as$u20$rustc_data_structures..array_vec..Arr
| | | |   |   | |   ->30.69% (88,251,136B) 0x9128645: _$LT$rustc_data_structures..accumulate_vec..AccumulateVec$LT$A$GT$$u20$as$u20$core..iter..traits..FromIterator$LT$$LT$A$u20$as$u20$rustc_data_structur
| | | |   |   | |     ->30.69% (88,251,136B) 0x922A1AD: rustc::ty::fold::TypeFoldable::fold_with::he61f397c4c6fc301 (iterator.rs:1302)
| | | |   |   | |       ->20.48% (58,891,648B) 0x9239AB6: rustc::ty::structural_impls::_$LT$impl$u20$rustc..ty..fold..TypeFoldable$LT$$u27$tcx$GT$$u20$for$u20$$RF$$u27$tcx$u20$rustc..ty..TyS$LT$$u27$tcx$G
| | | |   |   | |       | ->20.48% (58,891,648B) 0x9259E9B: _$LT$rustc..traits..project..AssociatedTypeNormalizer$LT$$u27$a$C$$u20$$u27$b$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$u20$as$u20$rustc..ty..fold..Ty
| | | |   |   | |       |   ->20.48% (58,891,648B) 0x923A220: rustc::ty::structural_impls::_$LT$impl$u20$rustc..ty..fold..TypeFoldable$LT$$u27$tcx$GT$$u20$for$u20$$RF$$u27$tcx$u20$rustc..ty..TyS$LT$$u27$t
| | | |   |   | |       |     ->10.26% (29,508,352B) 0x925B47B: rustc::traits::project::opt_normalize_projection_type::h1383880954193dab (project.rs:266)
| | | |   |   | |       |     | ->10.26% (29,508,352B) 0x925A277: rustc::traits::project::normalize_projection_type::hf37038bf35e92756 (project.rs:365)
| | | |   |   | |       |     | | ->10.26% (29,508,352B) 0x9259FF6: _$LT$rustc..traits..project..AssociatedTypeNormalizer$LT$$u27$a$C$$u20$$u27$b$C$$u20$$u27$gcx$C$$u20$$u27$tcx$GT$$u20$as$u20$rustc..ty..
| | | |   |   | |       |     | |   ->10.26% (29,508,352B) 0x923A220: rustc::ty::structural_impls::_$LT$impl$u20$rustc..ty..fold..TypeFoldable$LT$$u27$tcx$GT$$u20$for$u20$$RF$$u27$tcx$u20$rustc..ty..TyS$L
jonhoo commented 7 years ago

@nikomatsakis is it reasonable to assume that this and #43789 are really the same issue?

nikomatsakis commented 7 years ago

I did more investigation. In particular, I tried an experimental idea to reduce the size of the cache as soon as we can. This did reduce, the peak but not by much. Here is the "ideal" memory profile, based on the old caching policy around obligations:

    MB
22.60^                                             :
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |            :@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     |           @:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:::::::@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
     | ::@:: :: :@:@:::::::@::::::::@::::::::::@:#::::@:::::@::::@:::::@:::::@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   69.01

This is the profile of the current master:

    MB
274.3^                                           #
     |                @                          #
     |                @                          #
     |                @                          #          :               :
     |                @                          #          :               :
     |                @                          #          :               :
     |    @           @                          #          :               :
     |    @           @                          #          :               :
     |    @:::        @                          #          :               :
     |    @::         @                          #          :               :
     |    @::         @                          #          :               :
     |    @::         @                          #::        ::              ::
     |  @@@::         @::                        #:         ::              ::
     |  @ @::         @:                         #:         ::              ::
     |  @ @::         @:                         #:         ::              ::
     |  @ @::         @:                         #:         ::              ::
     | @@ @::         @:          :              #:         ::    :         ::
     | @@ @::         @:          :    :         #:         ::    :         ::
     | @@ @::         @:  :       :    :         #:         ::    :         ::
     |:@@ @::       : @: ::::::::@:::::::::::::::#: ::::::@:::@@:::@:::::::@::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   95.04

Notice the several peaks. Each of those are an "episode" the cache in use.

This is the profile with my patch, which clears the cache obligations after a cache hit in which we observe that there are no unbound type variables in the projection result:

    MB
230.5^      ##
     |      #
     |      #
     |     @#
     |     @#
     |     @#
     |     @#
     |     @#  :
     |     @# ::
     |     @# ::
     |   @ @# ::
     |   @ @# ::
     |   @:@# ::
     |   @:@# ::
     |   @:@# ::
     |  @@:@# ::
     |  @@:@# ::
     |  @@:@# ::
     |  @@:@# ::
     | @@@:@# ::         ::::::::::::::::::@@:::@:::@@:::@::::@:::::@::::::@::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   76.31

You can see it's improved, and the peak is lower in an absolute sense, but there is still a clear peak. I'm done for the day but will poke at this later on.

nikomatsakis commented 7 years ago

@jonhoo

is it reasonable to assume that this and #43789 are really the same issue?

I think that is very likely.

EDIT: Or some of the EndRegion business.

nikomatsakis commented 7 years ago

OK, I tried a more aggressive pruning pass, and that seems to have worked out pretty well:

    MB
51.89^ ##
     | #
     | #
     | #
     | #
     | #
     | #
     | #
     | #
     | #
     | #
     | #
     | #           ::::::::::::::::::::::::::::::::::::::::::::::::@::::::@:::
     | #           ::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
     | #           ::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
     | #           ::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
     | # ::::::::::::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
     | # ::: ::::::::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
     | # ::: ::::::::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
     | # ::: ::::::::: :: :: :::::: ::::::::::::::::::::: : :::::::@::::::@:::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   69.74

We're still using more memory, but it's a much more mild spike -- and execution time is about the same. Now I just have to make sure that I didn't break anything else along the way. =)

nikomatsakis commented 7 years ago

PR opened (#44269)

nikomatsakis commented 7 years ago

The version in the PR prunes mildly less aggressively -- I think my initial patches were incorrect -- but does remember to call shrink_to_fit(), and results in completely eliminating the memory spike:

    MB
22.39^                                :
     |           @::@:::::::::::@::@#::::::@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     |           @::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
     | @::::@::::@::@:::::::::::@::@#:::: :@::::::@:::::@:::::@::::@:::::@::::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   66.92