Open uncleasm opened 2 years ago
@llvm/issue-subscribers-backend-aarch64
I want to try fixing this.
This is happening because vdup_16(x) is folded to vdup_8(x) by clang. And, for umull2, vdup_8 is expanded to vdup_16 by tryCombineLongOpWithDup so that umull2 pattern can be used, but after it's expanded, existing vdup_8 is in tact.
This can be solved by expanding vdup_8 in umull_low to get_low(vdup_16(x)) when tryCombineLongOpWithDup is applied. But, I'm not sure how can I fit this into combine pass. I need to somehow access "sibling node."
Here's my patch:
When the same constant is needed both for low and high multiplications (or other low/high arithmetical operations?) the constant is suboptimally allocated both as an 8-byte and 16-byte variant, when the 16-byte variant would be enough.