llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.34k stars 11.7k forks source link

Alwaysinliner time explosion with new pass manager #59126

Open TNorthover opened 1 year ago

TNorthover commented 1 year ago

There have been problems with the inliner taking a long time in the new pass manager before, discussed in https://reviews.llvm.org/D98481, https://reviews.llvm.org/D120584, and the finally landed https://reviews.llvm.org/D121084.

Unfortunately the fix that landed is a cost-model tweak to suppress exponential inlining so doesn't apply to the alwaysinline case. I've had some success with going the D98481 route internally, but that has the significant(!) disadvantage of actually disabling inlining of some alwaysinline functions (no-one seems to have noticed though).

Just before Clang removed support for the legacy pass-manager entirely, the soon-to-be-attached reduced case took ~9s with the new one, and 0.04s with the old one. ToT is (as expected) also 9s.

TNorthover commented 1 year ago

tmp.txt is really tmp.cpp

dcci commented 1 year ago

cc: @aeubanks

dcci commented 1 year ago

Testcase inlined for convenience:

# 3 "" 3
template < class a > struct b {
  a c;
};
template < class > struct o;
template < class d > struct o< d * > {
  typedef d e;
};
template < class f > class g {
public:
  typename o< f >::e operator*();
  void operator++();
};
template < class h > bool operator!=(g< h >, g< h >);
template < class i > struct j {
  using k = typename i::l;
};
template < class i > struct m {
  using n = i;
  using l = typename j< n >::k;
  using aa = int;
};
template < class > class ab;
template < class d, class = ab< d > > class p;
template < class d > class ab {
public:
  typedef d *l;
};
template < class d, class af > class p {
public:
  typedef m< af > w;
  typedef g< typename w::l > q;
  q begin() const;
  q end() const;
  d operator[](typename w::aa) const;
};
class r;
typedef long s;
typedef int ah;
typedef p< r > aj;
typedef p< b< r > > ak;
enum al { am };
int at_bb;
class r {
  s an;

public:
  r &ao(const r &);
  template < typename ap > ap aq() const;
  al ar() const;
  template < typename as > auto at(as au) -> decltype(au(an));
  template < typename as > auto av(as au) const -> decltype(au(an));
  template < typename as > auto at(as au, const r &) -> decltype(au(an, s()));
  operator ah() const;
};
struct aw;
struct ax {
  aw &v;
  p< r > u;
  template < typename ap > auto operator()(ap t) -> decltype(v) { v(t, u); return v; }
};
struct ay {
  void operator()(double);
  ah operator()(const aj &) const;
  ah operator()(const ak &) const;
};
struct aw {
  template < typename ap, typename az > void operator()(ap, az);
  __attribute__((always_inline)) void operator()(ah, p< r > u) { ay()(u); }
};
r &r::ao(const r &ba) { at(aw(), ba); r rr; return rr; }
template < typename as >
__attribute__((always_inline, flatten)) auto r::at(as au) -> decltype(au(an)) {
  au(at_bb);
  return au(an);
}
template < typename as >
__attribute__((always_inline, flatten)) auto r::av(as au) const
    -> decltype(au(an)) {
  switch (ar())
  case am: {
    p< r > bc;
    au(bc);
    p< b< r > > bd;
    au(bd);
  }
  return au(an);
}
template < typename as > auto r::at(as au, const r &) -> decltype(au(an, s())) {
  at(ax{au});
  return au(an, s());
}
template <> ah r::aq() const { av(ay()); return ah{}; }
__attribute__((always_inline, flatten)) r::operator ah() const { aq< ah >(); return ah{};}
ah ay::operator()(const aj &be) const { return (ah) be[0];  }
ah ay::operator()(const ak &be) const {
  for (auto bf : be)
    ah(bf.c);
  return ah{};
}
dcci commented 1 year ago

This blows up a lot of real projects, this is a reduced example

aeubanks commented 1 year ago

some discussion in https://reviews.llvm.org/D138602

aemerson commented 11 months ago

Should be addressed by the relanding of the inliner change at 1a2e77cf9e

skc7 commented 9 months ago

@aemerson

With AMDGPU usecase, we have ~40,000 functions in module, with all the functions marked with "always_inline" attribute. With enablement of alwaysinliner legacy pass in cae033d in the pipeline, build hangs at this pass because of heavy memory usage (64gb ram entirely used up, in the machine I tested). On debug, found that splice api (LINK) on the basicblocks causes memory usage and when pass is run with 1000's of functions with always_inline attr, it causes build hang. Any help/info on how to fix this?

skc7 commented 8 months ago

@aemerson any inputs on the above mentioned issue?

aemerson commented 8 months ago

Have you tried to fix the issue?

skc7 commented 8 months ago

I couldn't find any alternative to splice api in llvm which doesn't cause this memory usage. We have disabled the pass for amdgpu in our downstream repo. But looking for solution to fix this in upstream.

skc7 commented 8 months ago

@aemerson any feedback about reducing memory usage?

nikic commented 8 months ago

@skc7 Please provide a reproducer.

skc7 commented 8 months ago

Drive link: https://drive.google.com/file/d/13O2Z9gCiliJ0nOUmMGZb9HOXZtR8Cxbz/view?usp=drive_link

File is about 137 mb. So, please download the file(before-ai.ll) from above link.

Use below command. On 64gb machine, memory gets used up while running the pass. Cmd: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -passes=always-inline before-ai.ll

dfukalov commented 8 months ago

Hi @skc7, I've checked your test case and it seems with reverted cae033d opt also uses all the memory and so is killed. So perhaps it should be created another issue? Anyway, I'll try to find a fix for the test case.

dfukalov commented 3 months ago

Hi @skc7, please check that PR 96958 fixes the issue.