Open zbyrn opened 2 weeks ago
Do you know what the SML source code is that produces this issue? It looks like it is just trying to force an overflow trap.
I believe that the above fix has the right idea, but that we should make it part of the move64'
function, since there are other places where it might be a problem. We also need a special case for when the destination is a register.
fun move64' (src, dst as I.Direct _, an) = mark' (move64(src, dst), an)
| move64' (src, dst, an) = let
(* the `MOVABSQ` instruction can only move values into registers, so
* we need to use a temporary register here.
*)
val tmpR = I.Direct (64, newReg ())
in
move64' (src, tmpR, an);
emitInstr (I.move {mvOp=I.MOVQ, src=tmpR, dst=dst})
end
We can then simplify the move'
function's cases for Immed64
and ImmedLabel
operands.
| move' (ty, src as I.Immed64 n, dst, an) = move64' (src, dst, an)
| move' (ty, src as I.ImmedLabel _ , dst, an) = move64' (src, dst, an)
I've pushed these changes into the CFA repository for testing.
The source code that produces this issue is the standard basis file num-format64.sml
at base/system/Basis/Implementation/num-format64.sml
. The particular assembly code snippet is from the function wordToDec
although this pattern also shows up elsewhere.
val iadd = InlineT.Int.fast_add
fun mkDigit (w : Word64.word) =
InlineT.CharVector.sub("0123456789ABCDEF", W.toInt w)
fun wordToDec w = let
fun f (w, n, l) = if (w < 0w10)
then (iadd(n, 1), (mkDigit w) :: l)
else let val j = w div 0w10
in
f (j, iadd(n, 1), mkDigit(w - 0w10*j) :: l)
end
in
f (w, 0, [])
end
The deliberate overflow trigger is generated by the cpsopt-code
phase. In mkDigit
, W.toInt
needs to perform a checked conversion from a 64-bit integer to 63-bit. The optimization pass seems to convert it to an explicit check.
[Before lowering]
std wordToDec713(v761[C],v193[I64]) =
known_tail f763(v373[I64],v374[I],v375[PV]) =
if u64<(v373,(I64)10) [v769] then
addu63(v374,(I63t)1) -> v342[I]
> testu_64_63(v373) -> v548[I]
numsubscriptvu8("0123456789ABCDEF",v548) -> v549[I]
{v549,v375} -> v195
v761(v342,v195)
else
quotu64(v373,(I64)10) -> j350[I64]
addu63(v374,(I63t)1) -> v357[I]
mulu64((I64)10,j350) -> v361[I64]
subu64(v373,v361) -> v360[I64]
> testu_64_63(v360) -> v551[I]
numsubscriptvu8("0123456789ABCDEF",v551) -> v552[I]
{v552,v375} -> v196
f763(j350,v357,v196)
f763(v193,(I63t)0,(I63t)0)
[After cpsopt-code ...]
std wordToDec713(v761[C],v193[I64]) =
known_tail f763(v373[I64],v374[I],v375[PV]) =
if u64<(v373,(I64)10) [v769] then
addu63(v374,(I63t)1) -> v342[I]
std_cont v905(v548[I]) =
numsubscriptvu8("0123456789ABCDEF",v548) -> v549[I]
{v549,v375} -> v195
v761(v342,v195)
> if u64<=(v373,(I64)4611686018427387903) [v908] then
> trunc_64_63(v373) -> v906[I]
> v905(v906)
> else
> iadd64((I64)9223372036854775808,(I64)9223372036854775808) -> v907[I64]
> v905(v373)
else
quotu64(v373,(I64)10) -> j350[I64]
addu63(v374,(I63t)1) -> v357[I]
mulu64((I64)10,j350) -> v361[I64]
subu64(v373,v361) -> v360[I64]
std_cont v909(v551[I]) =
numsubscriptvu8("0123456789ABCDEF",v551) -> v552[I]
{v552,v375} -> v196
f763(j350,v357,v196)
> if u64<=(v360,(I64)4611686018427387903) [v912] then
> trunc_64_63(v360) -> v910[I]
> v909(v910)
> else
> iadd64((I64)9223372036854775808,(I64)9223372036854775808) -> v911[I64]
> v909(v360)
f763(v193,(I63t)0,(I63t)0)
The proposed fix doesn't solve the issue because register allocation occurs after instruction selection. During instruction selection, all intermediate values are represented as pseudo-registers, and all moves have their dst
operands as I.Direct _
. It is only after register allocation that we know which of these I.Direct _
stay as direct operands. I pushed a revised fix that doesn't have the special case for register destination.
diff --git a/MLRISC/amd64/mltree/amd64-gen.sml b/MLRISC/amd64/mltree/amd64-gen.sml
index 8ba2417..8e8e0b1 100644
--- a/MLRISC/amd64/mltree/amd64-gen.sml
+++ b/MLRISC/amd64/mltree/amd64-gen.sml
@@ -199,16 +199,18 @@ functor AMD64Gen (
(* annotate an expression and emit it *)
fun mark (i, an) = mark' (I.INSTR i, an)
(* annotated 64-bit move *)
- fun move64' (src, dst as I.Direct _, an) = mark' (move64(src, dst), an)
- | move64' (src, dst, an) = let
+ fun move64' (src as (I.Immed64 _ | I.ImmedLabel _), dst, an) = let
(* the `MOVABSQ` instruction can only move values into registers, so
* we need to use a temporary register here.
*)
val tmpR = I.Direct (64, newReg ())
in
- move64' (src, tmpR, an);
+ mark' (move64 (src, tmpR), an);
emitInstr (I.move {mvOp=I.MOVQ, src=tmpR, dst=dst})
end
+ | move64' _ =
+ error "move64' works on immediate src. For other moves, use \
+ \`move' instead"
(* move with annotation *)
fun move' (ty, dst as I.Direct (_, s), src as I.Direct (_, d), an) =
if CB.sameColor (s, d)
I think that the best solution is to modify the spilling code to use the scratch register (I.C.asmTmpR
) when it encounters this situation. Really, the RA should not be spilling constants, since it can dematerialize them instead of loading from memory, but I do not remember how to make that work in MLRISC. I'll see if I can fix the spiller.
I agree that there is no good reason to spill a constant, and that's the reason why I described the original patch a temporary fix. If there is always a scratch register on x64, another approach could be to "normalize" the instruction right before getting emitted: if the emitter is asked to emit movabsq imm, indirect
, the emitter can just emit two instructions movabsq imm, %rax; movq %rax, indirect
.
Version
110.99.6.1 (Latest)
Operating System
OS Version
No response
Processor
System Component
Core system
Severity
Major
Description
On AMD64, a
mov(absq)
instruction can only move a 64-bit immediate value to a register, but not to a memory location (AMD64 Manual, Volumn 3, page 234). When MLRISC selects instructions for 64-bit literal assignment, it emits amov
instruction that moves the value to the destination pseudo-register (link to source). After register allocation, if the destination pseudo-register is spilled to the stack, the destination operand is replaced with an indirect stack access, which is not supported.Transcript
Truncated to show only the relevant parts:
Expected Behavior
The code after register allocation should move the immediate value into a temporary register, and then move that to the stack.
Steps to Reproduce
This problem only happens when using the new closure converter, which typically uses more registers than the current one. I created a snapshot at my development repository [link]. Use the following to reproduce the error,
Additional Information
I have implemented the following temporary fix, which seems to solve the issue in the short term.
Whenever a 64-bit value that cannot be represented as 32-bit extension needs to be moved to a register, it is moved to a temporary pseudo-register first (
tmp
) and then moved todst
. Since the lifetime oftmp
is very short, it is unlikely thattmp
would be spilled to the stack, satisfying themovabsq
requirement. However, I don't think it is a permanent solution since it is not guaranteed thattmp
would be allocated a register either --- it is just very likely to be the case. A more disciplined fix would be to implement some kind of normalization ofmovabsq
instructions after register allocation.Email address
byronzhong@cs.uchicago.edu