Wrong lval order for UMULL

sarranz commented 11 months ago

The lval order should be lower then higher, like the assembly.

bgregoir commented 11 months ago

This is why UMULL_lo_hi has be introduced. If we fix this then we need to patch also libjade.

vbgl commented 11 months ago

Can you please elaborate on what the issue is?

sarranz commented 11 months ago

The UMULL instruction in ARMv7 returns the low part first and the high part second

UMULL Unsigned Multiply Long multiplies two 32-bit unsigned values to produce a 64-bit result. Assembler syntax UMULL<c><q> <RdLo>, <RdHi>, <Rn>, <Rm>

So in Jasmin people will write (https://formosa-crypto.zulipchat.com/#narrow/stream/384904-Jasmin.3A-ARMv7-M-development/topic/Convention.20of.20long.20multiplication.20instructions.2E/near/399813331)

fn lo_umull(reg u32 x y) -> reg u32 {
    reg u32 lo;
    lo, _ = #UMULL(x, y);
    return lo;
}

but this function returns the high part of the multiplication. Compilation is correct, we output assembly that corresponds to the source, but I think this is a bug because it's very difficult to realize what's going on without carefully checking the assembly.

vbgl commented 11 months ago

I think the rationale is to be consistent with the “high-level” syntax _, lo = x * y;

sarranz commented 11 months ago

That is a fair point. If you think this is the way it should be, then we leave it and add a warning in the FAQ. To me, it makes sense that _, _ = _ * _ has any order we want, it's a high-level construct. But in my mind #UMULL is closer to umull than to _, _ = _ * _. I'm probably mistaken in that expectation, though. Let me know if I should close the issue and add something in the FAQ.

vincentvbh commented 11 months ago

When I'm using intrinsics #UMULL instead of _, _ = _ * _, I'm already expecting that I'll get something closer to the assembly counterpart. So I think we should have lo, hi = #UMULL(...). I have a neutral opinion for _, _ = _ * _. Maybe @bgregoir's UMULL_lo_hi will be helpful if the compiler team really wants hi, lo = #UMULL(...) for Armv7-M.

I think the downside of retaining hi, lo = #UMULL(...) is that it hinders the translation from Armv7-M assembly to Jasmin. The reason is that people generally call umull from assembly (I'm not aware of intrinsic for umull in C) while programming with Armv7-M. And, at least for me, prior to reading any documents, someone might simply try out something close to assembly and expect the same behavior as assembly. And if one encounters a failure or an error, then one looks for documentation. I'm not aware of such documentation regarding the intrinsic #UMULL. I'm talking about #UMULL at the Jasmin source level and not _, _ = _ * _.

I think the only reasonable justification for tying #UMULL to _, _ = _ * _ is when Jasmin becomes a mainstream programming language among developers of asm-crypto. Unfortunately, this is not the case currently.

vbgl commented 11 months ago

The documentation about the UMULL intrinsic is there: https://github.com/jasmin-lang/jasmin/blob/v2023.06.1/proofs/compiler/arm_instr_decl.v#L870-L891

vincentvbh commented 11 months ago

I don't think this is a documentation. Prove by contradiction that Coq file is not a documentation: I'm more familiar with C so I assume that invalidating C programs as documentation implies invalidating Coq programs as documentation. Let me show what becomes a documentation of FFT over rings with identity if we agree with what you said.

This function https://github.com/multi-moduli-ntt-saber/multi-moduli-ntt-saber/blob/master/gen_table/common/ntt_c.c#L229 computes radix-2 Cooley--Tukey FFT for arbitrary R[x] / (x^(2^k) - z^(2^k)) with custom layer-merging. In other words, it traverses in a tree-in-a-tree fashion. After tracing the {6, 7}-level for loop in great detail, you'll find that this is an iterative version of the traversal. The function I have shown is definitely ill-documented and only the author knows what's happening so this function is not documentation. Following the assumption, Coq programs are not documentation. QED.

I don't think it is reasonable to regard programs as documentation, regardless how ''elegant'' the syntax is. In my definition of documentation, it consists of natural language. Since we are dealing with science, documentation means documents in English (British English, and American English are both fine). You can use your own definition of documentation, but I believe in this case, you should clearly state your definition of ''documentation'' since, I believe, most people use a very different definition than yours.

sarranz commented 11 months ago

There really seems to be no consensus of whether to use high-level or low-level order. I propose we change the name of this intrinsic to UMULL_hi_lo.

vincentvbh commented 11 months ago

I think there is another question for hi, lo = #UMULL(...): it is inconsistent to lo, hi = #UMLAL(...). I don't really understand this. I think there will also be confusion if one only introduces UMULL_hi_lo and not hi, lo = #UMLAL_hi_lo(...).

I'm simply pointing out the inconsistency here and I'm not supporting the ordering. It will be very surprise and annoying if the inconsistency of the intrinsics themselves are not resolved.

bgregoir commented 11 months ago

The problem is that we have made a mistake initially. We can easily change the order but it will break existing development, this is why we have introduced this UMULL_hi_lo. So if the libjade developpers are agree to do this patch I have personally no opposition to change this.

vincentvbh commented 11 months ago

Ok, as long as UMULLis changed into UMULL_hi_lo, I can take that.

eponier commented 11 months ago

I think now is a good time to revert a bad choice, because the amount of jasmin developments out there is fairly limited. There is libjade and libjn, but we are in contact with the developers.

jasmin-lang / jasmin

Wrong lval order for UMULL #637