Closed mouse07410 closed 4 years ago
Thanks, this should be fixed by 3a00580.
@jschanck thank you - part of the problems are fixed now. Unfortunately, some of the assembler directives still fail (unknown directive
and unexpected token in '.section' directive
).
gcc -O3 -fomit-frame-pointer -march=native -fPIC -fPIE -pie -Wall -Wextra -Wpedantic -o test/test_polymul fips202.c kem.c owcpa.c pack3.c packq.c poly.c poly_r2_inv.c sample.c sample_iid.c verify.c randombytes.c square_1_701_patience.s square_3_701_patience.s square_6_701_patience.s square_12_701_shufbytes.s square_15_701_shufbytes.s square_27_701_shufbytes.s square_42_701_shufbytes.s square_84_701_shufbytes.s square_168_701_shufbytes.s square_336_701_shufbytes.s poly_rq_mul.s poly_r2_mul.s poly_rq_to_s3.s vec32_sample_iid.s poly_mod_3_Phi_n.s poly_mod_q_Phi_n.s poly_s3_to_rq.s poly_s3_inv.s poly_rq_mul_x_minus_1.s test/test_polymul.c cpucycles.c
square_1_701_patience.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_1_701_patience.s:6:1: error: unknown directive
.hidden square_1_701
^
square_3_701_patience.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_3_701_patience.s:6:1: error: unknown directive
.hidden square_3_701
^
square_6_701_patience.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_6_701_patience.s:6:1: error: unknown directive
.hidden square_6_701
^
square_12_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_12_701_shufbytes.s:6636:1: error: unknown directive
.hidden square_12_701
^
square_15_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_15_701_shufbytes.s:11774:1: error: unknown directive
.hidden square_15_701
^
square_27_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_27_701_shufbytes.s:4924:1: error: unknown directive
.hidden square_27_701
^
square_42_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_42_701_shufbytes.s:5480:1: error: unknown directive
.hidden square_42_701
^
square_84_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_84_701_shufbytes.s:4230:1: error: unknown directive
.hidden square_84_701
^
square_168_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_168_701_shufbytes.s:7284:1: error: unknown directive
.hidden square_168_701
^
square_336_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_336_701_shufbytes.s:6198:1: error: unknown directive
.hidden square_336_701
^
poly_rq_mul.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_rq_mul.s:325:1: error: unknown directive
.hidden poly_Rq_mul
^
poly_r2_mul.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_r2_mul.s:107:1: error: unknown directive
.hidden poly_R2_mul
^
poly_rq_to_s3.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_rq_to_s3.s:123:1: error: unknown directive
.hidden poly_Rq_to_S3
^
vec32_sample_iid.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
vec32_sample_iid.s:89:1: error: unknown directive
.hidden vec32_sample_iid
^
poly_mod_3_Phi_n.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_mod_3_Phi_n.s:56:1: error: unknown directive
.hidden poly_mod_3_Phi_n
^
poly_mod_q_Phi_n.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_mod_q_Phi_n.s:5:1: error: unknown directive
.hidden poly_mod_q_Phi_n
^
poly_s3_to_rq.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_s3_to_rq.s:293:1: error: unknown directive
.hidden poly_lift
^
poly_s3_inv.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_s3_inv.s:464:1: error: unknown directive
.hidden poly_S3_inv
^
poly_rq_mul_x_minus_1.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_rq_mul_x_minus_1.s:89:1: error: unknown directive
.hidden poly_Rq_mul_x_minus_1
^
make: *** [test/test_polymul] Error 1
Not sure if this is relevant: https://github.com/ClickHouse/ClickHouse/issues/8530
The "offending" code that macOS and it's toolchain don't support is (an excerpt, not a complete list):
avx2-hps2048509/asmgen/rq_mul/poly_rq_mul.py
111: p(".section .rodata")
avx2-hps2048509/asmgen/poly_mod_q_Phi_n.py
8: p(".section .rodata")
avx2-hps2048509/asmgen/poly_rq_to_s3.py
11: p(".section .rodata")
avx2-hps2048509/asmgen/poly_mod_3_Phi_n.py
9: p(".section .rodata")
and
avx2-hps4096821/asmgen/poly_rq_to_s3.py
30: p(".hidden {}poly_Rq_to_S3".format(NAMESPACE))
avx2-hps4096821/asmgen/poly_mod_q_Phi_n.py
12: p(".hidden {}poly_mod_q_Phi_n".format(NAMESPACE))
avx2-hps4096821/asmgen/poly_mod_3_Phi_n.py
15: p(".hidden {}poly_mod_3_Phi_n".format(NAMESPACE))
As I understand, Mac toolchains don't support these, but Linux does. So, if it would be possible to add some kind of guard to skip those if the OS is macOS (aka Darwin)?
Maybe. I could drop the ".hidden" directives without consequence, but the ".section .rodata" need to be changed. And that might just surface some other issues later in the build. You might be able to build an ELF object file with the right assembler flags, and then pass the resulting object file through objconv. Sort of a kludge.
Let me know if you find a solution. I don't have a good macOS setup right now.
I could drop the ".hidden" directives without consequence
Then, could you do so please?
but the ".section .rodata" need to be changed. And that might just surface some other issues later in the build
I'll experiment. I'm pretty sure there won't be any issues related to removal of this, because I've encountered this problem in other code - with the solution of just removing those .section rodata
lines invariably working fine.
But if you insist, I'll see if I can find a way to guard these commands, so it's removed only on macOS.
Let me know if you find a solution.
My proposed solution that I'll try on macOS and Linux (CentOS-8 and Ubuntu-18) is just dropping this directive altogether. Possibly, dropping it only for macOS - but it would be much simpler if one can abolish t completely.
I don't have a good macOS setup right now.
I've a very good macOS setup, and would be happy to run the trials/tests for you.
Another likely problem to surface is global names that macOS toolchain prefixes with underscore _
, but other systems (Linux) usually leave as-is. A blunt way to solve this (which I used with other code) was just duplicating the .global
into something like
.global funcname
.global _funcname
Would that be acceptable?
You might be able to build an ELF object file
Alas, doesn't work - the problem is parsing, not generating. And my only tools are Clang
with Xcode assembler, yasm
, and nasm
. None of them can parse these .s files, with Clang and yasm
being the closest to succeeding - they only barf on .section rodata
, .hidden
, and .att_syntax
.
I've re-worked my fix of avx2-hrss701/
.
Here's the current/latest patch that I've tested with Clang-10 and GCC-10, on macOS Catalina 10.15.6 with Xcode-11.6 and CentOS 8. avx2-hrss-mac.diff.txt
This is what the approach looks like (example of one file):
diff --git a/avx2-hrss701/asmgen/poly_mod_3_Phi_n.py b/avx2-hrss701/asmgen/poly_mod_3_Phi_n.py
index 9c7a5d3..aace969 100644
--- a/avx2-hrss701/asmgen/poly_mod_3_Phi_n.py
+++ b/avx2-hrss701/asmgen/poly_mod_3_Phi_n.py
@@ -1,20 +1,28 @@
p = print
+from sys import platform
+
from params import *
from mod3 import mod3, mod3_masks
if __name__ == '__main__':
p(".data")
- p(".section .rodata")
+ if platform != "darwin":
+ p(".section .rodata")
p(".p2align 5")
mod3_masks()
p(".text")
- p(".hidden {}poly_mod_3_Phi_n".format(NAMESPACE))
- p(".global {}poly_mod_3_Phi_n".format(NAMESPACE))
- p(".att_syntax prefix")
-
+ if platform == "darwin":
+ p(".global {}poly_mod_3_Phi_n".format(NAMESPACE))
+ p(".global _{}poly_mod_3_Phi_n".format(NAMESPACE))
+ else:
+ p(".hidden {}poly_mod_3_Phi_n".format(NAMESPACE))
+ p(".global {}poly_mod_3_Phi_n".format(NAMESPACE))
+ p(".att_syntax prefix")
+
+ p("_{}poly_mod_3_Phi_n:".format(NAMESPACE))
p("{}poly_mod_3_Phi_n:".format(NAMESPACE))
# rdi holds r
I would appreciate if you could apply it.
Unfortunately, there's about three times more work to make the other three AVX2 subdirectories Mac-compatible. Would you be able to do that?
Thanks for looking into this. I just pushed a "macOS" branch which has a potential fix on it. Let me know if it works.
Thanks for looking into this.
You're welcome.
I just pushed a "macOS" branch which has a potential fix on it. Let me know if it works.
Sorry to say, it doesn't. I don't think anything less than what my patch does, would fix the AVX2 code, because Mac toolchain cannot handle .section rodata
. So, the code that generates .section rodata
must be disabled for Mac, as my patch does. Also, once that compiles - there's likely to be an issue of C code expecting external functions to have I see you've taken care of that - thanks!_
prepended to their names, while the generated assembly code does not add it. This is likely to cause the link step to fail - but we're not there yet with the macOS
branch. ;-)
At least I haven't seen (yet?) the complaints about .hidden XXX
in this branch.
$ make -C avx2-hrss701 clean test
find . -name '*.pyc' -delete
find . -name '__pycache__' -delete
rm -f *.o
rm -f *.s
rm -f -r test/test_polymul
rm -f -r test/test_ntru
rm -f -r test/test_pack
rm -f -r test/speed
rm -f -r test/ram
rm -f -r test/encap
rm -f -r test/decap
rm -f -r test/keypair
rm -f -r test/speed_r2_inv
rm -f PQCgenKAT_kem
rm -f PQCkemKAT_*.req
rm -f PQCkemKAT_*.rsp
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--patience --callee 6 --namespace= --raw-name 1 \
> square_1_701_patience.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--patience --callee 6 --namespace= --raw-name 3 \
> square_3_701_patience.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--patience --callee 6 --namespace= --raw-name 6 \
> square_6_701_patience.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 12 \
> square_12_701_shufbytes.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 15 \
> square_15_701_shufbytes.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 27 \
> square_27_701_shufbytes.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 42 \
> square_42_701_shufbytes.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 84 \
> square_84_701_shufbytes.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 168 \
> square_168_701_shufbytes.s
PYTHONPATH=bitpermutations \
python3 bitpermutations/applications/squaring_mod_GF2N.py \
--shufbytes --namespace= --raw-name 336 \
> square_336_701_shufbytes.s
python3 asmgen/rq_mul/poly_rq_mul.py asmgen/rq_mul/K2_schoolbook_64x11.py asmgen/rq_mul/K2_K2_64x44.py > poly_rq_mul.s
python3 asmgen/poly_r2_mul.py > poly_r2_mul.s
python3 asmgen/poly_rq_to_s3.py > poly_rq_to_s3.s
python3 asmgen/vec32_sample_iid.py > vec32_sample_iid.s
python3 asmgen/poly_mod_3_Phi_n.py > poly_mod_3_Phi_n.s
python3 asmgen/poly_mod_q_Phi_n.py > poly_mod_q_Phi_n.s
python3 asmgen/poly_s3_to_rq.py > poly_s3_to_rq.s
python3 asmgen/poly_s3_inv.py > poly_s3_inv.s
python3 asmgen/poly_rq_mul_x_minus_1.py > poly_rq_mul_x_minus_1.s
/usr/bin/cc -O3 -fomit-frame-pointer -march=native -fPIC -fPIE -pie -Wall -Wextra -Wpedantic -o test/test_polymul fips202.c kem.c owcpa.c pack3.c packq.c poly.c poly_r2_inv.c sample.c sample_iid.c verify.c randombytes.c square_1_701_patience.s square_3_701_patience.s square_6_701_patience.s square_12_701_shufbytes.s square_15_701_shufbytes.s square_27_701_shufbytes.s square_42_701_shufbytes.s square_84_701_shufbytes.s square_168_701_shufbytes.s square_336_701_shufbytes.s poly_rq_mul.s poly_r2_mul.s poly_rq_to_s3.s vec32_sample_iid.s poly_mod_3_Phi_n.s poly_mod_q_Phi_n.s poly_s3_to_rq.s poly_s3_inv.s poly_rq_mul_x_minus_1.s test/test_polymul.c cpucycles.c
clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument]
square_1_701_patience.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_3_701_patience.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_6_701_patience.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_12_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_15_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_27_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_42_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_84_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_168_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
square_336_701_shufbytes.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_rq_mul.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_r2_mul.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_rq_to_s3.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
vec32_sample_iid.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_mod_3_Phi_n.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_mod_q_Phi_n.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_s3_to_rq.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_s3_inv.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
poly_rq_mul_x_minus_1.s:2:17: error: unexpected token in '.section' directive
.section .rodata
^
cpucycles.c:6:3: warning: extension used [-Wlanguage-extension-token]
asm volatile(".byte 15;.byte 49;shlq $32,%%rdx;orq %%rdx,%%rax"
^
1 warning generated.
make: *** [test/test_polymul] Error 1
$ git branch
* macOS
master
Unrelated - what do you think about this change (I'd welcome it!):
diff --git a/avx2-hrss701/Makefile b/avx2-hrss701/Makefile
index acfeb76..cde95b5 100644
--- a/avx2-hrss701/Makefile
+++ b/avx2-hrss701/Makefile
@@ -1,4 +1,4 @@
-CC = /usr/bin/cc
+CC ?= /usr/bin/cc
CFLAGS = -O3 -fomit-frame-pointer -march=native -fPIC -fPIE -pie
CFLAGS += -Wall -Wextra -Wpedantic
I personally don't think that it is necessary to mark section rodata
. It is certainly more secure that way - but the code should work regardless. So, if you don't like putting if platform == "darwin":
in the .py files - perhaps you can just eliminate those p(".section rodata")
commands altogether...?
Here's a patch to make avx2-hrss701
compile and run correctly: avx2-macos.diff.txt
This excerpt is similar to the previous patch, but simpler - as you did a big part of the work already:
diff --git a/avx2-hrss701/bitpermutations/bitpermutations/printing.py b/avx2-hrss701/bitpermutations/bitpermutations/printing.py
index 1702fd6..0c3c14c 100644
--- a/avx2-hrss701/bitpermutations/bitpermutations/printing.py
+++ b/avx2-hrss701/bitpermutations/bitpermutations/printing.py
@@ -4,6 +4,7 @@ import bitpermutations.data as data
import bitpermutations.utils as utils
from .utils import reg_to_memfunc
+from sys import platform
def print_memfunc(f, in_size, out_size, per_reg=256, initialize=False):
"""Wraps a function that operates on registers in .data and .text sections,
@@ -22,7 +23,8 @@ def print_memfunc(f, in_size, out_size, per_reg=256, initialize=False):
f(out_data, in_data)
print(".data")
- print(".section .rodata")
+ if platform != "darwin":
+ print(".section .rodata")
print(".p2align 5")
for mask in data.DATASECTION:
print(mask.data())
Ah, really thought I'd removed the .section .rodata's... just doing this with a chain of sed commands. Should be fixed now.
Perfect! Your macOS
branch builds and runs fine on MacOS 10.15.6 with the latest stable Xcode.
Thank you!
Great! Did you check all 4 parameter sets, or just hrss701?
I checked hrss701 and hps4096821. I assume you did the same for the other two hpsXXXXXXX. ;-)
I checked them on Linux---as I said, I don't have a macOS system to test on right now.
If you check the other two parameter sets, I can close this issue and merge the branch.
If you check the other two parameter sets, I can close this issue and merge the branch.
Yep, tested all the four. All are good to go!
Getting some warnings - not sure what to do about them, and how bad they are in general. My "normal" approach is to get rid of all the compiler warnings, but I acknowledge that it doesn't always work...
Building avx2-hps4096877
(and other avx2-hpsXXXXXXX
):
$ CC=gcc make -C avx2-hps4096821/ clean all
. . . . .
/usr/bin/cc -O3 -fomit-frame-pointer -march=native -fPIC -fPIE -pie -Wall -Wextra -Wpedantic -o test/speed_r2_inv fips202.c crypto_sort_int32.c djbsort/sort.c kem.c owcpa.c pack3.c packq.c poly.c poly_lift.c poly_r2_inv.c poly_s3_inv.c sample.c sample_iid.c verify.c randombytes.c square_1_821_patience.s square_3_821_patience.s square_6_821_patience.s square_12_821_shufbytes.s square_24_821_shufbytes.s square_51_821_shufbytes.s square_102_821_shufbytes.s square_204_821_shufbytes.s square_408_821_shufbytes.s poly_rq_mul.s poly_r2_mul.s poly_rq_to_s3.s vec32_sample_iid.s poly_mod_3_Phi_n.s poly_mod_q_Phi_n.s cpucycles.c test/speed_r2_inv.c
clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument]
djbsort/sort.c:25:7: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(*x,*y);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:183:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x1,x0);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:184:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x3,x2);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:185:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x2,x0);
^
. . . . .
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:1176:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x[j],x[j+1]);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:1177:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x[j+2],x[j+3]);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:1181:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x[j],x[j+2]);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
djbsort/sort.c:1183:5: warning: extension used [-Wlanguage-extension-token]
int32_MINMAX(x[j],x[j+1]);
^
djbsort/int32_minmax_x86.c:4:3: note: expanded from macro 'int32_MINMAX'
asm( \
^
66 warnings generated.
cpucycles.c:6:3: warning: extension used [-Wlanguage-extension-token]
asm volatile(".byte 15;.byte 49;shlq $32,%%rdx;orq %%rdx,%%rax"
^
1 warning generated.
Again, on an unrelated issue - I'd like this change (pretty much throughout this repo), if possible:
diff --git a/avx2-hrss701/Makefile b/avx2-hrss701/Makefile
index acfeb76..cde95b5 100644
--- a/avx2-hrss701/Makefile
+++ b/avx2-hrss701/Makefile
@@ -1,4 +1,4 @@
-CC = /usr/bin/cc
+CC ?= /usr/bin/cc
CFLAGS = -O3 -fomit-frame-pointer -march=native -fPIC -fPIE -pie
CFLAGS += -Wall -Wextra -Wpedantic
It's nice when I don't have to edit Makefile to build with one compiler, or another. Same would apply to CFLAGS
, though I agree that many users may have their CFLAGS set to something that this repo might not like.
Great! Thanks for your help. I've made the CC ?= ...
change too.
've made the
CC ?= ...
change too.
One more thing: could I ask you to apply that change to all the Makefiles, please? Thanks!
Can you point me to the specific file that you're having trouble with? I changed all 8 of the files named "Makefile".
Darn... I did git pull
several times, and it did not pick that change. Let me re-clone and report.
Funny. I removed the repo, re-cloned - and everything's good! No, I don't have any explanation why git pull
did not do what it was supposed to.
Thanks again.
3.5 GHz Dual-Core Intel Core i7, macOS Catalina 10.15.6, Xcode-11.6, GCC-10, current master
Fails to compile on a 64-bit-only OS:
Complete build log: avx2-build.txt