oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.43k stars 1.64k forks source link

Improve graal match rule solution: choose the optimal rule for a pattern. #2210

Open XiaohongGong opened 4 years ago

XiaohongGong commented 4 years ago

Feature request

Please include the following information:

Is your feature request related to a problem? Please describe. As we all know, a original code pattern can be optimized to multiply patterns. For example:

and x1, x1, 0xffffffff
lsl x1, x1, #5
add x1, x0, x1

can be optimized to:

and x1, x1, 0xffffffff
add x1, x0, x1, lsl #5

It also has a better optimization pattern:

add x1, x0, x1, uxtw #5

Obviously the second pattern is better than the first one.

However, currently graal can only match a single rule for the original codes. If a rule is matched, it breaks and continue the next step. The optimized result is greatly decided by the definition order of the match rules.

Assume that the compiler has the following two rules:

1) @MatchRule("(Add=op x (LeftShift y Constant))")
2) @MatchRule("(Add=op x (LeftShift (And y Constant) Constant))")

If the first rule is matched before the second one, the final optimized instructions will be:

and x1, x1, 0xffffffff
add x1, x0, x1, lsl #5

And if the second rule is matched firstly, the final instruction is:

add x1, x0, x1, uxtw #5

This is not convenient. And sometimes we need to adjust the match rules order to make sure the expected rule is matched.

Describe the solution you'd like. A better solution is to add the comparing rules (instruction costs or something else) for all the match rules that can be matched, and choose the optimal one finally.

Describe who do you think will benefit the most. All the users and contributors can benefit from it.

Describe alternatives you've considered. Maybe graal can get the match rule solution in C2 compiler as a reference.

Additional context. Add any other context about the feature request here. For example, link to the relevant projects, documentation, standards.

Express whether you'd like to help contributing this feature If you'd like to contribute, please read the contribution guide.

tkrodriguez commented 4 years ago

Yes this is a known limitation. It was never really intended to be fully equivalent to C2's adlc as that seems like engineering overkill given our limited goals for matching. The efficiency of the matching becomes more critical the more patterns that might be matched as well. Would a more definitive match order based on the number of nodes matched be sufficient? Or it could force you to reorder them if an earlier match might hide a later match.

XiaohongGong commented 4 years ago

Would a more definitive match order based on the number of nodes matched be sufficient?

Thanks for your additional explanation for it! And I think it's sufficient to add the nodes number dependency for the case I mentioned above.

Thanks, Xiaohong

tkrodriguez commented 4 years ago

I'll see what I can do. The real power of the adlc approach comes from the fact that it produces an optimal cover for a DAG that requires multiple machine nodes. So if the DAG can be matched using 2 nodes or 3 nodes, it can efficiently pick the optimal answer based on the total cost of those nodes. But that assumes your costs actually make sense.

XiaohongGong commented 4 years ago

I'll see what I can do.

Sorry for the late replying! It's great that it's in your plan. I'm looking forward to it. Thank you so much!