Closed hsivonen closed 5 years ago
cc @mw @nnethercote
Looks like nll needs a lot of memory here
[0m[0m[1m[32m Compiling[0m packed_simd v0.3.1
time: 0.054; rss: 57MB parsing
time: 0.000; rss: 58MB attributes injection
time: 0.000; rss: 58MB recursion limit
time: 0.000; rss: 58MB crate injection
time: 0.000; rss: 58MB plugin loading
time: 0.000; rss: 58MB plugin registration
time: 0.005; rss: 58MB pre ast expansion lint checks
time: 2.550; rss: 369MB expand crate
time: 0.000; rss: 369MB check unused macros
time: 2.550; rss: 369MB expansion
time: 0.000; rss: 369MB maybe building test harness
time: 0.012; rss: 369MB maybe creating a macro crate
time: 0.048; rss: 370MB creating allocators
time: 0.036; rss: 370MB AST validation
time: 0.497; rss: 412MB name resolution
time: 0.075; rss: 412MB complete gated feature checking
time: 0.321; rss: 481MB lowering ast -> hir
time: 0.081; rss: 482MB early lint checks
time: 0.052; rss: 504MB validate hir map
time: 0.353; rss: 504MB indexing hir
time: 0.000; rss: 504MB load query result cache
time: 0.000; rss: 504MB looking for entry point
time: 0.000; rss: 504MB dep graph tcx init
time: 0.001; rss: 504MB looking for plugin registrar
time: 0.001; rss: 504MB looking for derive registrar
time: 0.019; rss: 504MB loop checking
time: 0.024; rss: 504MB attribute checking
time: 0.000; rss: 515MB solve_nll_region_constraints(DefId(0/1:2171 ~ packed_simd[a932]::v64[0]::f32x2[0]::{{constant}}[0]))
*snip*
time: 0.000; rss: 527MB solve_nll_region_constraints(DefId(0/1:4611 ~ packed_simd[a932]::vSize[0]::{{impl}}[587]::from[0]::U[0]::array[0]::{{constant}}[0]))
time: 0.636; rss: 527MB stability checking
time: 0.124; rss: 527MB type collecting
time: 0.003; rss: 527MB outlives testing
time: 0.019; rss: 527MB impl wf inference
time: 0.000; rss: 1113MB solve_nll_region_constraints(DefId(0/1:224 ~ packed_simd[a932]::codegen[0]::shuffle[0]::{{impl}}[0]::{{constant}}[0]))
*snip*
time: 0.000; rss: 1246MB solve_nll_region_constraints(DefId(0/1:4867 ~ packed_simd[a932]::vPtr[0]::{{impl}}[104]::{{constant}}[0]))
time: 9.972; rss: 1408MB coherence checking
time: 0.002; rss: 1408MB variance testing
time: 0.000; rss: 1605MB solve_nll_region_constraints(DefId(0/1:366 ~ packed_simd[a932]::codegen[0]::v16[0]::{{impl}}[0]::NT[0]::{{constant}}[0]))
*snip*
time: 0.000; rss: 2013MB solve_nll_region_constraints(DefId(0/0:4027 ~ packed_simd[a932]::codegen[0]::reductions[0]::mask[0]::{{impl}}[7]::any[0]))
time: 0.000; rss: 2013MB solve_nll_region_constraints(DefId(0/0:4053 ~ packed_simd[a932]::codegen[0]::reductions[0]::mask[0]::{{impl}}[17]::any[0]))
time: 5.040; rss: 2013MB MIR borrow checking
time: 0.000; rss: 2013MB dumping chalk-like clauses
time: 0.005; rss: 2013MB MIR effect checking
time: 0.072; rss: 2018MB death checking
time: 0.021; rss: 2018MB unused lib feature checking
time: 0.176; rss: 2019MB lint checking
time: 0.000; rss: 2019MB resolving dependency formats
time: 0.890; rss: 2055MB write metadata
time: 0.010; rss: 2055MB collecting roots
time: 0.186; rss: 2056MB collecting mono items
time: 0.196; rss: 2056MB monomorphization collection
time: 0.001; rss: 2056MB codegen unit partitioning
time: 0.122; rss: 2060MB codegen to LLVM IR
time: 0.000; rss: 2060MB assert dep graph
time: 0.000; rss: 2060MB serialize dep graph
time: 1.215; rss: 2060MB codegen
time: 0.056; rss: 2063MB llvm function passes [packed_simd.smey8184-cgu.0]
time: 0.777; rss: 2071MB llvm module passes [packed_simd.smey8184-cgu.0]
time: 0.798; rss: 2079MB codegen passes [packed_simd.smey8184-cgu.0]
time: 1.703; rss: 1539MB LLVM passes
time: 0.000; rss: 1540MB serialize work products
time: 0.017; rss: 1540MB linking
Coherence checking also takes a good chunk of memory:
time: 0.000; rss: 1246MB solve_nll_region_constraints(DefId(0/1:4867 ~ packed_simd[a932]::vPtr[0]::{{impl}}[104]::{{constant}}[0]))
time: 9.972; rss: 1408MB coherence checking
although NLL is the first suspect here. I wonder why NLL uses this much memory, packed_simd
is full of methods, but the great majority of them are essentially one liners.
Reported the following spike of memory usage in #57432, which occurred after #56723
This one could be closed as duplicate of https://github.com/rust-lang/rust/issues/57432 I guess.
EDIT: @mati865 you are right, these are duplicates, I thought that was a different issue that apparently never got filled, so forget this.
original comment:
@mati865 while they are related, they are two different issues:
this issue is about compiling packed_simd
itself, which started using much more memory recently, resulting in some builds failing for consumers (encoding-rs)
packed_simd
is part of libcore
(e.g. via core::simd
)I did a DHAT run. The "At t-gmax" measurement is the relevant one, it's short for "time of global max". It shows that the interning of constants within TypeFolder
is accounting for over 54% of the global peak:
AP 1.1.1.1.1/2 (2 children) {
Total: 912,261,120 bytes (12.02%, 7,312.63/Minstr) in 6 blocks (0%, 0/Minstr), avg size 152,043,520 bytes, avg lifetime 103,155,024,513.33 instrs (82.69% of program duration)
At t-gmax: 912,261,120 bytes (54.74%) in 6 blocks (0%), avg size 152,043,520 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 1,827,458,569 bytes (4.97%, 14,648.81/Minstr), 2/byte
Writes: 844,260,160 bytes (9.59%, 6,767.54/Minstr), 0.93/byte
Allocated at {
#1: 0xB66BCCB: alloc (alloc.rs:72)
#2: 0xB66BCCB: alloc (alloc.rs:148)
#3: 0xB66BCCB: allocate_in<u8,alloc::alloc::Global> (raw_vec.rs:96)
#4: 0xB66BCCB: with_capacity<u8> (raw_vec.rs:140)
#5: 0xB66BCCB: new<u8> (lib.rs:66)
#6: 0xB66BCCB: arena::DroplessArena::grow (lib.rs:346)
#7: 0x8C1BB25: alloc_raw (lib.rs:362)
#8: 0x8C1BB25: alloc<rustc::ty::sty::LazyConst> (lib.rs:378)
#9: 0x8C1BB25: alloc<rustc::ty::sty::LazyConst> (lib.rs:465)
#10: 0x8C1BB25: intern_lazy_const (context.rs:1123)
#11: 0x8C1BB25: <rustc::traits::project::AssociatedTypeNormalizer<'a, 'b, 'gcx, 'tcx> as rustc::ty::fold::TypeFolder<'gcx, 'tcx>>::fold_const (project.rs:423)
#12: 0x8C1B235: fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:1049)
#13: 0x8C1B235: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:719)
#14: 0x8C1B235: <rustc::traits::project::AssociatedTypeNormalizer<'a, 'b, 'gcx, 'tcx> as rustc::ty::fold::TypeFolder<'gcx, 'tcx>>::fold_ty (project.rs:337)
#15: 0x890C0D0: fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:769)
#16: 0x890C0D0: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:135)
#17: 0x890C0D0: fold_with<rustc::ty::subst::Kind,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#18: 0x890C0D0: {{closure}}<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:328)
#19: 0x890C0D0: call_once<(&rustc::ty::subst::Kind),closure> (function.rs:279)
#20: 0x890C0D0: map<&rustc::ty::subst::Kind,rustc::ty::subst::Kind,&mut closure> (option.rs:414)
#21: 0x890C0D0: next<rustc::ty::subst::Kind,core::slice::Iter<rustc::ty::subst::Kind>,closure> (mod.rs:567)
#22: 0x890C0D0: <smallvec::SmallVec<A> as core::iter::traits::collect::Extend<<A as smallvec::Array>::Item>>::extend (lib.rs:1349)
#23: 0x8EF9787: from_iter<[rustc::ty::subst::Kind; 8],core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>> (lib.rs:1333)
#24: 0x8EF9787: collect<core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>,smallvec::SmallVec<[rustc::ty::subst::Kind; 8]>> (iterator.rs:1466)
#25: 0x8EF9787: rustc::ty::subst::<impl rustc::ty::fold::TypeFoldable<'tcx> for &'tcx rustc::ty::List<rustc::ty::subst::Kind<'tcx>>>::super_fold_with (subst.rs:328)
#26: 0x8C1B183: fold_with<&rustc::ty::List<rustc::ty::subst::Kind>,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#27: 0x8C1B183: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:721)
#28: 0x8C1B183: <rustc::traits::project::AssociatedTypeNormalizer<'a, 'b, 'gcx, 'tcx> as rustc::ty::fold::TypeFolder<'gcx, 'tcx>>::fold_ty (project.rs:337)
#29: 0x890C0D0: fold_with<rustc::traits::project::AssociatedTypeNormalizer> (structural_impls.rs:769)
#30: 0x890C0D0: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:135)
#31: 0x890C0D0: fold_with<rustc::ty::subst::Kind,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#32: 0x890C0D0: {{closure}}<rustc::traits::project::AssociatedTypeNormalizer> (subst.rs:328)
#33: 0x890C0D0: call_once<(&rustc::ty::subst::Kind),closure> (function.rs:279)
#34: 0x890C0D0: map<&rustc::ty::subst::Kind,rustc::ty::subst::Kind,&mut closure> (option.rs:414)
#35: 0x890C0D0: next<rustc::ty::subst::Kind,core::slice::Iter<rustc::ty::subst::Kind>,closure> (mod.rs:567)
#36: 0x890C0D0: <smallvec::SmallVec<A> as core::iter::traits::collect::Extend<<A as smallvec::Array>::Item>>::extend (lib.rs:1349)
#37: 0x8EF9787: from_iter<[rustc::ty::subst::Kind; 8],core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>> (lib.rs:1333)
#38: 0x8EF9787: collect<core::iter::adapters::Map<core::slice::Iter<rustc::ty::subst::Kind>, closure>,smallvec::SmallVec<[rustc::ty::subst::Kind; 8]>> (iterator.rs:1466)
#39: 0x8EF9787: rustc::ty::subst::<impl rustc::ty::fold::TypeFoldable<'tcx> for &'tcx rustc::ty::List<rustc::ty::subst::Kind<'tcx>>>::super_fold_with (subst.rs:328)
#40: 0x8BFE173: fold_with<&rustc::ty::List<rustc::ty::subst::Kind>,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#41: 0x8BFE173: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (macros.rs:344)
#42: 0x8BFE173: fold_with<rustc::ty::sty::TraitRef,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#43: 0x8BFE173: super_fold_with<rustc::ty::sty::TraitRef,rustc::traits::project::AssociatedTypeNormalizer> (macros.rs:397)
#44: 0x8BFE173: fold_with<core::option::Option<rustc::ty::sty::TraitRef>,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#45: 0x8BFE173: super_fold_with<rustc::traits::project::AssociatedTypeNormalizer> (macros.rs:344)
#46: 0x8BFE173: fold_with<rustc::ty::ImplHeader,rustc::traits::project::AssociatedTypeNormalizer> (fold.rs:47)
#47: 0x8BFE173: fold<rustc::ty::ImplHeader> (project.rs:315)
#48: 0x8BFE173: normalize_with_depth<rustc::ty::ImplHeader> (project.rs:274)
#49: 0x8BFE173: normalize<rustc::ty::ImplHeader> (project.rs:258)
#50: 0x8BFE173: rustc::traits::coherence::with_fresh_ty_vars (coherence.rs:107)
@eddby @oli-obk @RalfJung Any thoughts on how to improve intern_lazy_const
?
Cc @eddyb
Any thoughts on how to improve
intern_lazy_const
?
There is an obvious problem: intern_lazy_const
doesn't intern the value! And the values passed are exceedingly repetitive. Here's a histogram of the top 10, which account for 97.2% of the calls:
17886042 counts:
( 1) 5253160 (29.4%, 29.4%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 2 }) })
( 2) 5192895 (29.0%, 58.4%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 4 }) })
( 3) 3928986 (22.0%, 80.4%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 8 }) })
( 4) 1600916 ( 9.0%, 89.3%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 16 }) })
( 5) 719785 ( 4.0%, 93.3%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 32 }) })
( 6) 299507 ( 1.7%, 95.0%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 1 }) })
( 7) 271847 ( 1.5%, 96.5%): Evaluated(Const { ty: usize, val: Scalar(Bits { size: 8, bits: 64 }) })
( 8) 61636 ( 0.3%, 96.9%): Unevaluated(DefId(0/1:4735 ~ packed_simd[3c0f]::vPtr[0]::mptrx4[0]::{{constant}}[0]), [])
( 9) 61636 ( 0.3%, 97.2%): Unevaluated(DefId(0/1:4823 ~ packed_simd[3c0f]::vPtr[0]::mptrx8[0]::{{constant}}[0]), [])
( 10) 61636 ( 0.3%, 97.6%): Unevaluated(DefId(0/1:4653 ~ packed_simd[3c0f]::vPtr[0]::mptrx2[0]::{{constant}}[0]), [])
Fixing this should drastically reduce the memory usage.
I tried doing the obvious thing by introducing GlobalCtxt::lazy_const_interner
, heavily inspired by GlobalCtxt::layout_interner
, but I couldn't get the lifetimes to work. I will try again tomorrow if nobody else beats me to it.
FWIW, without the in-flight fix here, a relatively small tweak to packed_simd
made packed_simd
uncompilable on an ARMv7 system whose /proc/meminfo
says there's 3624684 kB of RAM plus some swap. (And a Chrome OS kernel; I don't know what kind of swap use policy Chrome OS applies.)
I'll test again once the fix for this issue is in nightly.
This just brought down my whole system -- 16GB of RAM used to be enough to compile two rustc in parallel (with 8 jobs each), but with the current RAM consumption that does not seem to be the case any more.
Can you try again with today's nightly?
FWIW, without the in-flight fix here, a relatively small tweak to
packed_simd
madepacked_simd
uncompilable on an ARMv7 system whose/proc/meminfo
says there's 3624684 kB of RAM plus some swap. (And a Chrome OS kernel; I don't know what kind of swap use policy Chrome OS applies.)I'll test again once the fix for this issue is in nightly.
Much better memory usage now. Thank you!
It seems it would be worthwhile to nominate this for uplift to beta, but I'm not permitted to add the tag myself.
Steps to reproduce
packed_simd = '0.3.1'
toCargo.toml
of the new crate.Actual results
While compiling
packed_simd
, rustc takes more than 2 GB of RAM.Expected results
Lesser RAM usage.
Additional info
Maybe it's just the nature of
packed_simd
that it takes a lot of RAM to compile, and there's no bug. However, if RAM usage reached 3 GB in the future, the crate would become unbuildable on 32-bit systems. It might be worthwhile to investigate if buildingpacked_simd
has to take this much RAM or if there is an opportunity to use less RAM without adversely affecting compilation speed on systems that have plenty of RAM.