supranational / blst

Multilingual BLS12-381 signature library
Apache License 2.0
454 stars 171 forks source link

BLST throws illegal instruction error on AMD K10 CPUs (Windows) #200

Closed wjblanke closed 7 months ago

wjblanke commented 7 months ago

Windows event viewer shows 0xc000001d (illegal instruction) when running BLST on a K10 AMD 905e CPU

Disassembly of code at offset shows pshufb instruction being executed.

According to x86 docs:

"pshufb (packed shuffle bytes) is not supported on AMD K8/K10, first included on AMD FX* (Bulldozer). It comes with SSSE3 (Supplemental SSE3) set."

But BLST uses it quite a bit for sha256 :-(

(venv) Williams-MacBook-Pro:blst bill$ grep -r pshufb * src/asm/x86_64-xlate.pl:my $pshufb = sub { src/asm/sha256-x86_64.pl: pshufb $TMP,@MSG[0] src/asm/sha256-x86_64.pl: pshufb $TMP,@MSG[1] src/asm/sha256-x86_64.pl: pshufb $TMP,@MSG[2] src/asm/sha256-x86_64.pl: pshufb $TMP,@MSG[3] src/asm/sha256-x86_64.pl: pshufb $t3,@X[0] src/asm/sha256-x86_64.pl: pshufb $t3,@X[1] src/asm/sha256-x86_64.pl: pshufb $t3,@X[2] src/asm/sha256-x86_64.pl: pshufb $t3,@X[3] src/asm/sha256-x86_64.pl: '&pshufb ($t3,$t4)', # sigma1(X[14..15]) src/asm/sha256-x86_64.pl: '&pshufb ($t3,$t5)', src/asm/sha256-x86_64.pl: #&pshufb ($t3,$t4); # sigma1(X[14..15]) src/asm/sha256-x86_64.pl: #&pshufb ($t3,$t5);

Thanks for your efforts!!! Cheers

illegalinstruction
mratsim commented 7 months ago

From the README:

If final application crashes with an "illegal instruction" exception [after copying to another system], pass -D__BLST_PORTABLE__ on build.sh command line. If you don't use build.sh, complement the CFLAGS environment variable with the said command line option.

wjblanke commented 7 months ago

We do use portable for BLST. We run on a wide range of CPUs (100,000+ users), so we know portable works very well. AMD K10s are the only problem reports we have received. Thanks!

wjblanke commented 7 months ago

AMD K10 doesn't have SSSE3, eg https://www.cpu-world.com/CPUs/K10/AMD-Phenom%20II%20X4%20905e%20-%20HD905EOCK4DGI%20(HD905EOCGIBOX).html

From BLST:

sha256_block procedure for x86_64.

This module is stripped of AVX and even scalar code paths, with rationale that

a) AVX1 is [justifiably] faster than SSSE3 code path only on one processor, venerable Sandy Bridge; b) AVX2 incurs costly power transitions, which would be justifiable if AVX2 code was executing most of the time, which is not the case in the context; c) all contemporary processors support SSSE3, so that nobody would actually use scalar code path anyway;

So BLST can't run on AMD K10s without throwing illegal instructions :-(

mratsim commented 7 months ago

You are mentioning src/asm/sha256-x86_64.pl but the portable one is src/asm/sha256-portable-x86_64.pl

Alternatively you can compile BLST without assembly: https://github.com/supranational/blst/blob/56f9198/src/no_asm.h#L1225

hoffmang9 commented 7 months ago

Either way, we are building __BLST_PORTABLE - https://github.com/Chia-Network/bls-signatures/blob/main/src/CMakeLists.txt#L47

emlowe commented 7 months ago

Ok, in the win64 directory there doesn't appear to be sha256-portable... - however, it is there in the elf directory.

So we include for windows anyway the win64/sha256-x86_64.asm assembler file.

emlowe commented 7 months ago

I'll note our current released version uses some custom CMake scripts to build BLST and uses it in a Python wheel

However, our upcoming release uses a rust crate for this and leverages the provided BLST rust build environment from build.rs. We are definitely using the portable setting, but this rust crate also has an illegal instruction we believe is due to the use of pshufb. However, we don't have an entire build system on this old CPU so we don't have exact Debugging tools to pinpoint the exact failing instruction

wjblanke commented 7 months ago

Actually it crashed in the same code and instruction (pshufb) in the rust version. The offset (which I got from event viewer on the 905e system) is different of course.

https://github.com/supranational/blst/blob/master/src/asm/sha256-x86_64.pl#L408

rustillegal
dot-asm commented 7 months ago

Wow! How deep is it reasonable to go? The rationale behind omitting sha256-portable-x86_64.asm from x86_64-pc-windows-msvc build is as follows. It's tricky to arrange, admittedly not impossible, but the barrier was deemed to be high enough to not attempt to jump over it, because nobody should be irresponsible enough to have an unsupported Windows system on the Internet. I mean all Windows versions that supported pre-SSSE3 processors are out of support by now.

What to do? Note that there is build/coff/sha256-portable-x86_64.s, so that you can make Win64 build as portable as Linux one by using mingw toolchain. In Rust context it means using --target=x86_64-pc-windows-gnu. Is this sufficient?

dot-asm commented 7 months ago

Alternatively you can compile BLST without assembly:

As a point of clarification. Non-assembly builds are not actually supported on x86_64 and aarch64 platforms. Attempt to build it even should fail...

dot-asm commented 7 months ago

--target=x86_64-pc-windows-gnu. Is this sufficient?

On a related note. What's your VC version? It would have to be pretty old, wouldn't it? I mean newer versions don't support older Windows, so I can imagine some jumping-through-the-hoops is going on when Windows builds are produced. Question is what's more tricky, throwing together a mingw environment or putting together out-of-support VC installation?

dot-asm commented 7 months ago

--target=x86_64-pc-windows-gnu.

Since it's not exclusively about Rust, it might be appropriate to clarify that mingw option is not limited to Rust. Of course not. And in addition to that, if you control the build procedure, you might find it useful to know that you can compile build/assemble.S with clang --target=x86_64-pc-windows-msvc as alternative to compiling build/win64/*.asm files with ml64.

dot-asm commented 7 months ago

--target=x86_64-pc-windows-gnu.

Hmm, after double-checking the suggestion that this would work on pre-SSSE3 processors turned out to be wrong. Sorry! However! It's way easier to fix that by simply harmonizing it with ELF than to square the [msvc] circle in build.rs. So the question "is this sufficient" still stands. A variant of it. In other words, the suggestion is to make x86_64-pc-windows-gnu work and call it a day :-)

dot-asm commented 7 months ago

make x86_64-pc-windows-gnu work

As in #201.

wjblanke commented 7 months ago

Wow! How deep is it reasonable to go? The rationale behind omitting sha256-portable-x86_64.asm from x86_64-pc-windows-msvc build is as follows. It's tricky to arrange, admittedly not impossible, but the barrier was deemed to be high enough to not attempt to jump over it, because nobody should be irresponsible enough to have an unsupported Windows system on the Internet. I mean all Windows versions that supported pre-SSSE3 processors are out of support by now.

The box I am using for testing with the AMD 905e CPU is running Windows 10 Home which isn't EOL until 2025. I'll admit it is pretty slow!

What to do? Note that there is build/coff/sha256-portable-x86_64.s, so that you can make Win64 build as portable as Linux one by using mingw toolchain. In Rust context it means using --target=x86_64-pc-windows-gnu. Is this sufficient?

Unfortunately we have to use the Visual C++ calling convention since Python is compiled using MSVC++ and we link in under that, so mingw doesn't work for our situation. We are compiling using the Visual Studio version featured by the github runners, which I think is MSVC++ 2022.

Thanks for looking into this. We recently switched from using Relic to BLST as our underlying BLS library and regrettably that has left some users with these AMD chips out in the cold.

dot-asm commented 7 months ago

The box I am using for testing with the AMD 905e CPU is running Windows 10 Home

Question was if it's actually supported. The fact that it might be possible to trick Windows [10] to install on unsupported hardware doesn't really mean that it makes it qualified for support by everybody :-)

Visual C++ calling convention

For reference, as far as blst itself goes, x86_64-pc-windows-gnu and x86_64-pc-windows-msvc are interchangeable. Because it's C, not C++. So that you can compile assembly.S with the mingw toolchain or clang and link it into your C++ thing [compiled with MSVC]. As for Github Actions, they have both preinstalled, ready to be called.

dot-asm commented 7 months ago

As an additional point, clang-cl is meant to be a drop-in replacement for cl. Yet at the same time you can compile assembly.S with it. Without bothering with --target.

dot-asm commented 7 months ago

clang-cl is meant to be a drop-in replacement for cl.

Just in case, the implied suggestion is to use clang-cl for everything, to compile build/assembly.S, src/server.c and your C++ thing. Unification! :-)

wjblanke commented 7 months ago

Could we get a new release of the rust crate that supports K10? I think that may be all we need.

Thanks!

emlowe commented 7 months ago

Assumingcargo build --features=portable --target=x86_64-pc-windows-gnu works, I don't think we would care about the specific toolchain that much

dot-asm commented 7 months ago

Assumingcargo build --features=portable --target=x86_64-pc-windows-gnu works

Double-check #201!

dot-asm commented 7 months ago

As for having clang-cl compile assembly.S by cargo, no, cc-rs doesn't let you pull it off, not as is. But it's possible to use "raw" clang:

  1. get clang on your %PATH%;
  2. set %CC% environment variable to clang;
  3. clone #201 and change to its bindings/rust directory;
  4. execute cargo test --release --features=portable;
  5. cargo clean -p blst --release;
  6. cargo test --release ---features=portable -vvv;

The last two steps are meant to convince you that it does call clang -c -D__BLST_PORTABLE__ assembly.S.

As for non-Rust builds. Other build systems ought to respect CC environment variable too. And have user-defined rules. What I'm driving at is that in general you should be able to perform a ~Windows~ VS build by merely setting %CC% to clang-cl. As already mentioned, it was designed to be a drop-in replacement for cl. And you should be able to provide a suitable .S -> .obj rule...

dot-asm commented 7 months ago

Could we get a new release of the rust crate that supports K10?

I'm waiting out [at least] cc-rs release that will make --target=aarch64-pc-windows-msvc work. Meanwhile you can always use [patch.crates-io] to redirect blst to github.

emlowe commented 7 months ago

CC=clang does appear to make a correct rust crate and subsequent wheel via maturin - need to test on the ancient AMD system to be certain, but it works on a normal machine. I assume there is some minimal performance loss for everyone else in sha256, but that is the price for portable

emlowe commented 7 months ago

Confirmed that building with CC=clang along with #201 and using portable works on an AMD K10 machine.

dot-asm commented 7 months ago

Thanks!

Rigidity commented 4 months ago

Could we get a new release of the rust crate that supports K10?

I'm waiting out [at least] cc-rs release that will make --target=aarch64-pc-windows-msvc work. Meanwhile you can always use [patch.crates-io] to redirect blst to github.

Hey, curious if there's any update on this and if you're planning on cutting a new release sometime soon? Would be ideal to eventually not have to patch in the specific commit hash.