Open FiloSottile opened 3 years ago
I think these would be nice improvements if they're easy and don't add complexity to fiat-crypto's output routines (which aren't verified), but I don't think any of these affect the assembly ultimately generated by the Go compilers?
@JasonGross definitely types for differentiating between Montgomery domain. +1
FWIW, the syntax rules for Go documentation are described at: https://golang.org/pkg/go/doc/#ToHTML
I think the main noteworthy things for fiat-go code are that blank lines separate paragraphs, and runs of indented lines get formatted with \<pre>.
I think the main noteworthy things for fiat-go code are that blank lines separate paragraphs, and runs of indented lines get formatted with
<pre>
.
@mdempsky What's the rule for runs of indented lines not separated from the previous line by a newline. That is, how is something like
line1
line2
line3
line4
line5
line6
line7
line8
line9
line10
Which indented lines become <pre>
blocks?
- run
gofmt
on the output (which will only fix some of the whitespace
This one is a bit tricky logistically, I'd rather not make gofmt
a requirement for synthesizing the go files via the Makefile. I can change the output to match the spacing conventions of gofmt
, though?
Which indented lines become
<pre>
blocks?
It looks like lines 3 and 4, 6 and 7, and 9 and 10 become pre blocks:
This one is a bit tricky logistically, I'd rather not make gofmt a requirement for synthesizing the go files via the Makefile.
Out of curiosity, what's the logistic difficulty here? I thought Go is pretty generally available nowadays.
I can change the output to match the spacing conventions of gofmt, though?
gofmt can be somewhat finnicky about things; e.g., it'll format 1 * 2
and 3 + 4
, but then 1*2 + 3
.
However, to the extent you're able to better match the output, that sounds good and desirable.
Out of curiosity, what's the logistic difficulty here? I thought Go is pretty generally available nowadays.
The files are regenerated by the default makefile target. I'd like to be able to say all of the following:
Makes sense. So we just need a formally verified implementation of gofmt in Coq? :)
- run
gofmt
on the output (which will only fix some of the whitespaceThis one is a bit tricky logistically, I'd rather not make
gofmt
a requirement for synthesizing the go files via the Makefile. I can change the output to match the spacing conventions ofgofmt
, though?
Sure, you can run gofmt -d
to see the changes it would apply to a file.
* use `uint64` instead of `uint1` and drop addcarryxU64 and subborrowxU64 wrappers
This is more than cosmetic. Getting rid of the wrappers and assuming that bits.Add64
behaves as advertised ("The carryOut output is guaranteed to be 0 or 1") improves curve25519 CarryMul
performance by ~35%[0]. I assume CarrySquare
will also see a significant performance gain.
var x51 uint64
var x52 uint1
x51, x52 = addcarryxU64(x13, x7, 0x0)
var x53 uint64
x53, _ = addcarryxU64(x14, x8, x52)
var x51 uint64
var x52 uint64
x51, x52 = bits.Add64(x13, x7, 0x0)
var x53 uint64
x53, _ = bits.Add64(x14, x8, x52)
While we're wishing for various fixes to the Go output, Selectznz
specifies a private type for one of it's arguments (arg1 uint1
), severely limiting the usefulness of the routine.
[0]: With go 1.16.5. 1.17beta1 does a better job of optimizing the current code so the gain is "only" ~31%.
This is more than cosmetic. Getting rid of the wrappers and assuming that
bits.Add64
behaves as advertised ("The carryOut output is guaranteed to be 0 or 1") improves curve25519CarryMul
performance by ~35%[0]. I assumeCarrySquare
will also see a significant performance gain.
If you typedef uint1 to uint64, does that give you the performance benefit?
If removing the wrappers is really that important, I can add a rewriting pass to relax the bounds for go output.
If you typedef uint1 to uint64, does that give you the performance benefit?
name \ time/op baseline typedef-ed no-wrapper
CarryMult-8 66.4ns ± 1% 50.8ns ± 1% 45.9ns ± 1%
name \ time/op baseline-1.17 typedef-ed-1.17 no-wrapper-1.17
CarryMult-8 61.7ns ± 0% 48.8ns ± 0% 44.7ns ± 0%
It is considerably faster, but the wrapper still has overhead, even though it gets inlined, because of what appears to be inline marking.
0x023d 00573 (curve25519.go:404) XCHGL AX, AX
0x023e 00574 (curve25519.go:406) XCHGL AX, AX
0x023f 00575 (curve25519.go:409) XCHGL AX, AX
0x0240 00576 (curve25519.go:411) XCHGL AX, AX
0x0241 00577 (curve25519.go:413) MOVQ $2251799813685247, R9
0x024b 00587 (curve25519.go:413) ANDQ R9, R12
0x024e 00590 (curve25519.go:416) XCHGL AX, AX
0x024f 00591 (curve25519.go:418) XCHGL AX, AX
0x0250 00592 (curve25519.go:421) XCHGL AX, AX
0x0251 00593 (curve25519.go:423) XCHGL AX, AX
0x0252 00594 (curve25519.go:426) XCHGL AX, AX
0x0253 00595 (curve25519.go:428) XCHGL AX, AX
0x0254 00596 (curve25519.go:431) XCHGL AX, AX
0x0255 00597 (curve25519.go:433) XCHGL AX, AX
0x0256 00598 (curve25519.go:436) XCHGL AX, AX
0x0257 00599 (curve25519.go:438) XCHGL AX, AX
0x0258 00600 (curve25519.go:441) XCHGL AX, AX
0x0259 00601 (curve25519.go:443) XCHGL AX, AX
0x025a 00602 (curve25519.go:446) XCHGL AX, AX
0x025b 00603 (curve25519.go:448) XCHGL AX, AX
0x025c 00604 (curve25519.go:451) XCHGL AX, AX
0x025d 00605 (curve25519.go:453) XCHGL AX, AX
0x025e 00606 (curve25519.go:456) XCHGL AX, AX
0x025f 00607 (curve25519.go:462) XCHGL AX, AX
0x0260 00608 (curve25519.go:468) XCHGL AX, AX
0x0261 00609 (curve25519.go:474) XCHGL AX, AX
0x0262 00610 (curve25519.go:475) SUBQ CX, DX
0x0265 00613 (curve25519.go:476) MOVQ DI, CX
0x0268 00616 (curve25519.go:476) SHRQ $51, DI
0x026c 00620 (curve25519.go:476) SHLQ $13, DX
Since I don't see soul crushing NOP-walls of doom in the version of the routine where I manually removed the wrapper, I assume that the compiler doesn't emit a InlMark
(or gets rid of it at some point) for direct calls to bits.Add64
because of the special cases in the compiler's SSA code.
At some point (in the far future) the compiler behavior may resolve itself, but at least one related issue has been open since 2019 (https://github.com/golang/go/issues/29571).
- use
uint64
instead ofuint1
and drop addcarryxU64 and subborrowxU64 wrappers
We should either do this or export uint1
as Uint1
. At the moment there are exported functions taking arguments of unexported types.
[ ] use
uint64
instead ofuint1
and drop addcarryxU64 and subborrowxU64 wrappers[ ] use
:=
instead ofvar
declarations[x] emit types for different domains (Montgomery and not Montgomery)
[x] make comments shorter
This should be the first line in the file, which by convention will get recognized by various tooling as autogenerated code and left alone.
gofmt
on the output (which will only fix some of the whitespace