Closed rotateright closed 9 years ago
Resolving as invalid. FMF hasn't been added to intrinsics yet, but it was added to FCMP instructions, so I think it's just a matter of time. FMF in the DAG is being enabled...slowly.
FWIW, I think this should not be too hard to fix. If you grep UnsafeFPMath in lib/CodeGen, there are only about 20 places the flag is checked (mostly in DAGCombine). There are another ~30 places in the various in-tree targets.
Ok, that's a relief. I had nightmares...
But now back to the IR FMF design itself (and its utter lack of documentation!). Is this existing behavior correct?
$ cat fastmul.ll ; (XY) X => (XX) Y define float @fmul6(float %x, float %y) { %mul = fmul float %x, %y <---- this is a strict fmul %mul1 = fmul fast float %mul, %x ret float %mul1
$ ./opt -instcombine fastmul.ll -S
define float @fmul6(float %x, float %y) { %1 = fmul fast float %x, %x %mul1 = fmul fast float %1, %y ret float %mul1 }
Did we optimize away an FP op that could have raised an SNaN or some other exception?
Well, we're been doing this wrong for a long time, and still do in the backend (which only has function-level flags and does not have fast-math flags for anything). That also needs to be fixed before the whole system really will work correctly. Just do it with a FIXME.
Wow...yes, I didn't even consider the backend schism. Do you have some ideas about how the backend should process IR-level FMF?
I think that we need to do the same thing that was done (recently) with the no-wrap flags: propagate them onto the associated SDNode objects.
Without backend support, is there any way that we can honestly intermingle fast and strict code? It seems to me that we will have a completely broken FMF design until then. Ie, no matter what the IR FMF says, the backend is going to use the function-level attribute, and therefore, supposedly strict instructions are going to be treated as fast. Similarly, when fast instructions get inlined into a strict function, they won't be considered for any fast optimizations.
That's correct. This cannot truly work correctly in all configurations without backend FMF support.
I went back and read the FMF proposal (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054999.html), and it seems that this was not discussed.
Interesting. I imagine this was somewhat of an oversight.
FWIW, I think this should not be too hard to fix. If you grep UnsafeFPMath in lib/CodeGen, there are only about 20 places the flag is checked (mostly in DAGCombine). There are another ~30 places in the various in-tree targets.
Well, we're been doing this wrong for a long time, and still do in the backend (which only has function-level flags and does not have fast-math flags for anything). That also needs to be fixed before the whole system really will work correctly. Just do it with a FIXME.
Wow...yes, I didn't even consider the backend schism. Do you have some ideas about how the backend should process IR-level FMF?
Without backend support, is there any way that we can honestly intermingle fast and strict code? It seems to me that we will have a completely broken FMF design until then. Ie, no matter what the IR FMF says, the backend is going to use the function-level attribute, and therefore, supposedly strict instructions are going to be treated as fast. Similarly, when fast instructions get inlined into a strict function, they won't be considered for any fast optimizations.
I went back and read the FMF proposal (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054999.html), and it seems that this was not discussed.
However, maybe this is itself incorrect. As you point out, a strictly-compiled sqrt likely should not "go away" because it is used by fast-math code. Maybe 'fast' on an FP op really needs to mean, "you can use the result in a non-strict way". I'm thinking that we may need to do it this way instead. Thoughts?
At first glance, I think that would be better...but I better sit on it a bit because I got it all wrong up until now. :)
Okay :)
But if we do interpret 'fast' in this way, we can't do the D5584 optimization until we resolve bug 21290?
Well, we're been doing this wrong for a long time, and still do in the backend (which only has function-level flags and does not have fast-math flags for anything). That also needs to be fixed before the whole system really will work correctly. Just do it with a FIXME.
However, maybe this is itself incorrect. As you point out, a strictly-compiled sqrt likely should not "go away" because it is used by fast-math code. Maybe 'fast' on an FP op really needs to mean, "you can use the result in a non-strict way". I'm thinking that we may need to do it this way instead. Thoughts?
At first glance, I think that would be better...but I better sit on it a bit because I got it all wrong up until now. :)
But if we do interpret 'fast' in this way, we can't do the D5584 optimization until we resolve bug 21290?
Let me see if I can understand this better with an example:
%y = fsub double %x, %x %z = fadd fast double %y, %y ret double %z
IIUC, we can't optimize this to:
ret double 0.0
because the fsub isn't fast.
So here's a testcase from http://reviews.llvm.org/D5584. In that patch, we said that this:
%sqrt = call double @llvm.sqrt.f64(double %f) %mul = fmul fast double %sqrt, %sqrt ret double %mul
...could be optimized based solely on the 'fast' in the fmul. Why are we allowed to nuke the sqrt even though it's a strict op?
As you know, this is not really specified. Going with the current interpretation, I'd say because our ability to do the simplification depends only on the sqrt operation, not on its operands. In the first example, just knowing that the fadd operands are some subtract does not itself allow for simplification. Just knowing that the operands to the fmul is a sqrt enables the simplification.
However, maybe this is itself incorrect. As you point out, a strictly-compiled sqrt likely should not "go away" because it is used by fast-math code. Maybe 'fast' on an FP op really needs to mean, "you can use the result in a non-strict way". I'm thinking that we may need to do it this way instead. Thoughts?
Let me see if I can understand this better with an example:
%y = fsub double %x, %x %z = fadd fast double %y, %y ret double %z
IIUC, we can't optimize this to:
ret double 0.0
because the fsub isn't fast.
So here's a testcase from http://reviews.llvm.org/D5584. In that patch, we said that this:
%sqrt = call double @llvm.sqrt.f64(double %f) %mul = fmul fast double %sqrt, %sqrt ret double %mul
...could be optimized based solely on the 'fast' in the fmul. Why are we allowed to nuke the sqrt even though it's a strict op?
Sanjay, My comments in D5222 may have been misleading. In the context of testing I was thinking it would simplify test case generation if we checked the function attribute. However, in practice we cannot do so for the reason pointed out by Hal (i.e., inlining + LTO).
From your comment in D5222:
I think the case where detecting a 'fast' flag on an instruction is inadequate > is when we're looking to optimize an FP intrinsic.
If the intrinsics don't have the 'fast' flag/attribute, then this is a different issue.
No, this is incorrect. Non-fast instructions can be inlined into functions compiled with fast-math enabled, and must still get the strict treatment.
Ok, I'm confused. How does this reconcile with the "inner" and "outer" logic example from http://reviews.llvm.org/D5787?
In that review we were discussing calls to sqrt(). As you know, we don't have fast-math flags on intrinsics yet, so we can't really do the right thing here universally. As a result, for now, you can fall back to looking at the function attribute to figure out what is allowed.
However, where we have fast-math flags, we can treat this properly. And for unsafe-algebra, this applies only to the operands of the operation, not how the result is used. This is because fast-math optimized functions can be inlined into strict-compiled source, and the fast-math settings should not "leak" into operations for which we have strict semantics. Likewise, strictly-compiled source might be inlined into a fast-math compiled source, and the fast-math optimizations can apply to results from the strict math, but should not leak into the operations compiled strictly. Now this last case, for intrinsics like sqrt(), we cannot handle properly (as I describe above).
No, this is incorrect. Non-fast instructions can be inlined into functions compiled with fast-math enabled, and must still get the strict treatment.
Ok, I'm confused. How does this reconcile with the "inner" and "outer" logic example from http://reviews.llvm.org/D5787?
No, this is incorrect. Non-fast instructions can be inlined into functions compiled with fast-math enabled, and must still get the strict treatment.
Hal is correct! I believe the pass honors this requirement, but it's rather difficult to test.
Filing this bug as requested in http://reviews.llvm.org/D5222 and discussed in http://reviews.llvm.org/D5787.
Currently, the reassociate pass checks for IR-level fast-math-flags (FMF) when performing optimizations. It should also check function-level attributes to see if reassociation is allowed.
LTO (and possibly just regular inlining?) creates a scenario where non-fast instructions could end up in a function that has some level of fast-ness. In that case, the parent function's attributes apply to whatever was inlined.
No, this is incorrect. Non-fast instructions can be inlined into functions compiled with fast-math enabled, and must still get the strict treatment.
The other case where IR level FMF is not currently available is with FP intrinsics. This is tracked by bug 21290.
Extended Description
Filing this bug as requested in http://reviews.llvm.org/D5222 and discussed in http://reviews.llvm.org/D5787.
Currently, the reassociate pass checks for IR-level fast-math-flags (FMF) when performing optimizations. It should also check function-level attributes to see if reassociation is allowed.
LTO (and possibly just regular inlining?) creates a scenario where non-fast instructions could end up in a function that has some level of fast-ness. In that case, the parent function's attributes apply to whatever was inlined.
The other case where IR level FMF is not currently available is with FP intrinsics. This is tracked by bug 21290.