google / ml-compiler-opt

Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
Apache License 2.0
612 stars 92 forks source link

[Question] Is the model sensitive to target ISA? #317

Closed pshung closed 8 months ago

pshung commented 9 months ago

I expect you trained the inlining model for x86 ISA. So, when I use the pre-trained model for other ISAs, I get a worse result on each test case than the heuristic algorithm.

So, I should re-train the model with a target ISA's code size as a reward. Is it correct? Thank you.

Colibrow commented 9 months ago

To achieve optimal performance with your dataset, it is crucial not to overlook the importance of the ISA.

mtrofin commented 9 months ago

To add to what @Colibrow said - @pshung, that's our experience as well, for instance for Chrome on Android we trained an arm model, etc. But - and @petrhosek could confirm - we did see better results on Fuchsia, on both arm and x86 when training a model on a combined x86 and arm corpus.

pshung commented 9 months ago

@mtrofin

we did see better results on Fuchsia, on both arm and x86 when training a model on a combined x86 and arm corpus.

Is that outcome explainable? Are you suggesting that those 11 target-independent features also represent the target ISA's features so it's possible to train a monolithic model that fits all target ISAs? I can't figure it out.

mtrofin commented 9 months ago

(to make sure we're talking about the same thing) you're referring to the features in llvm/Analysis/InlineModelFeatureMaps.h, correct? (there are more than 11, hence wanted to make sure)

When targeting x86 or arm, the IR wouldn't necessarily be the same, if the source had target-specific conditional compilation, so the feature values and their distributions could differ. In any case, we didn't do a deep investigation into it. It could be that the total corpus (module count-wise, double the size of a single ISA one) ended up offering both larger training data, and better training data by stressing out commonalities and differences? In any case, I offered it as an anecdote, I don't think we can say we know this is a reusable rule. It'd be interesting to learn if your experience is similar (and then even more interesting if someone dug into analyzing why :))