Open TNorthover opened 1 year ago
tmp.txt is really tmp.cpp
cc: @aeubanks
Testcase inlined for convenience:
# 3 "" 3
template < class a > struct b {
a c;
};
template < class > struct o;
template < class d > struct o< d * > {
typedef d e;
};
template < class f > class g {
public:
typename o< f >::e operator*();
void operator++();
};
template < class h > bool operator!=(g< h >, g< h >);
template < class i > struct j {
using k = typename i::l;
};
template < class i > struct m {
using n = i;
using l = typename j< n >::k;
using aa = int;
};
template < class > class ab;
template < class d, class = ab< d > > class p;
template < class d > class ab {
public:
typedef d *l;
};
template < class d, class af > class p {
public:
typedef m< af > w;
typedef g< typename w::l > q;
q begin() const;
q end() const;
d operator[](typename w::aa) const;
};
class r;
typedef long s;
typedef int ah;
typedef p< r > aj;
typedef p< b< r > > ak;
enum al { am };
int at_bb;
class r {
s an;
public:
r &ao(const r &);
template < typename ap > ap aq() const;
al ar() const;
template < typename as > auto at(as au) -> decltype(au(an));
template < typename as > auto av(as au) const -> decltype(au(an));
template < typename as > auto at(as au, const r &) -> decltype(au(an, s()));
operator ah() const;
};
struct aw;
struct ax {
aw &v;
p< r > u;
template < typename ap > auto operator()(ap t) -> decltype(v) { v(t, u); return v; }
};
struct ay {
void operator()(double);
ah operator()(const aj &) const;
ah operator()(const ak &) const;
};
struct aw {
template < typename ap, typename az > void operator()(ap, az);
__attribute__((always_inline)) void operator()(ah, p< r > u) { ay()(u); }
};
r &r::ao(const r &ba) { at(aw(), ba); r rr; return rr; }
template < typename as >
__attribute__((always_inline, flatten)) auto r::at(as au) -> decltype(au(an)) {
au(at_bb);
return au(an);
}
template < typename as >
__attribute__((always_inline, flatten)) auto r::av(as au) const
-> decltype(au(an)) {
switch (ar())
case am: {
p< r > bc;
au(bc);
p< b< r > > bd;
au(bd);
}
return au(an);
}
template < typename as > auto r::at(as au, const r &) -> decltype(au(an, s())) {
at(ax{au});
return au(an, s());
}
template <> ah r::aq() const { av(ay()); return ah{}; }
__attribute__((always_inline, flatten)) r::operator ah() const { aq< ah >(); return ah{};}
ah ay::operator()(const aj &be) const { return (ah) be[0]; }
ah ay::operator()(const ak &be) const {
for (auto bf : be)
ah(bf.c);
return ah{};
}
This blows up a lot of real projects, this is a reduced example
some discussion in https://reviews.llvm.org/D138602
Should be addressed by the relanding of the inliner change at 1a2e77cf9e
@aemerson
With AMDGPU usecase, we have ~40,000 functions in module, with all the functions marked with "always_inline" attribute. With enablement of alwaysinliner legacy pass in cae033d in the pipeline, build hangs at this pass because of heavy memory usage (64gb ram entirely used up, in the machine I tested). On debug, found that splice api (LINK) on the basicblocks causes memory usage and when pass is run with 1000's of functions with always_inline attr, it causes build hang. Any help/info on how to fix this?
@aemerson any inputs on the above mentioned issue?
Have you tried to fix the issue?
I couldn't find any alternative to splice api in llvm which doesn't cause this memory usage. We have disabled the pass for amdgpu in our downstream repo. But looking for solution to fix this in upstream.
@aemerson any feedback about reducing memory usage?
@skc7 Please provide a reproducer.
Drive link: https://drive.google.com/file/d/13O2Z9gCiliJ0nOUmMGZb9HOXZtR8Cxbz/view?usp=drive_link
File is about 137 mb. So, please download the file(before-ai.ll) from above link.
Use below command. On 64gb machine, memory gets used up while running the pass. Cmd: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -passes=always-inline before-ai.ll
There have been problems with the inliner taking a long time in the new pass manager before, discussed in https://reviews.llvm.org/D98481, https://reviews.llvm.org/D120584, and the finally landed https://reviews.llvm.org/D121084.
Unfortunately the fix that landed is a cost-model tweak to suppress exponential inlining so doesn't apply to the
alwaysinline
case. I've had some success with going the D98481 route internally, but that has the significant(!) disadvantage of actually disabling inlining of somealwaysinline
functions (no-one seems to have noticed though).Just before Clang removed support for the legacy pass-manager entirely, the soon-to-be-attached reduced case took ~9s with the new one, and 0.04s with the old one. ToT is (as expected) also 9s.