Open Validark opened 4 weeks ago
Looks like the X86DomainReassignment pass failed on run_lengths2
for some reason, but the debug output doesn't say anything.
@llvm/issue-subscribers-backend-x86
Author: Niles Salter (Validark)
Looks like the X86DomainReassignment pass failed on
run_lengths2
for some reason, but the debug output doesn't say anything.
I have a modified version of the code which takes a number of bitstrs, calculates ends
and starts
for each of them, and tracks the bitwise OR of all ends
and all starts
. LLVM auto-vectorizes just one of those. I.e., it pays the (heavy and most likely not worthwhile) cost of moving the bitstr
s over to a vector and does bitstr & ~(bitstr >> 1)
but not bitstr & ~(bitstr << 1)
in the vector (or vice versa). It does those one-by-one in general-purpose registers, even though it already paid the hefty cost to move the data to a vector. Is that probably the same issue as this one or should I open a new issue for that?
AFAICT X86DomainReassignment doesn't account for multiple uses of the mask intermediates, so the cost calculation just determines that moving 2 gprs to kmask isn't worth it, but it does this for both kmask transfers separately without realizing they share a lot of the costs.
AFAICT X86DomainReassignment doesn't account for multiple uses of the mask intermediates, so the cost calculation just determines that moving 2 gprs to kmask isn't worth it, but it does this for both kmask transfers separately without realizing they share a lot of the costs.
Opened an issue for this here, although this particular example might be a little tricky: https://github.com/llvm/llvm-project/issues/105763
I had this function:
Translates to this for Zen 4:
So far so good. Looks like LLVM is moving
bitstr
over to a k register before executing the first two lines of myrun_lengths1
procedure. However, when I computebitstr
:It decides to do the computation in regular registers, then move the two outputs into
k
registers separately:Godbolt link
This issue is about consistency. Why is LLVM not consistent in this decision?
Edit: removed incorrect information about Zen 5 latencies