cescoffier commented 1 year ago

Started GraalVM 23.0, there is a new parameter when building on AMD64 machines:

Native Image now targets x86-64-v3 architecture by default on AMD64 and provides a new -march option to specify 
target compatibility. Use -march=compatibility for best compatibility or -march=native for best performance if a native 
executable is deployed on the same machine or on a machine with the same CPU features. To list all available machine 
types, use -march=list.

Building a container on one machine and running it on another (EC2...) can lead to issues like: https://github.com/quarkusio/quarkus-images/issues/241.

An "ok" workaround could be to set march to compability when building on AMD64 and add a note about that in the native compilation reference guide.

gsmet commented 1 year ago

I think it makes sense but maybe it should be an option the user can define (with compatibility as the default)?

maxandersen commented 1 year ago

Seems to be the good choice.

Do we know what causes it - must be that its build on a newer architecture but the EC2 instances chosen is using the older architecture?

cescoffier commented 1 year ago

So, I need to do more checks, but is we consider the C6 EC2 instance type, it uses an Ice Lake processor which provides:

MMX instructions SSE / Streaming SIMD Extensions SSE2 / Streaming SIMD Extensions 2 SSE3 / Streaming SIMD Extensions 3 SSSE3 / Supplemental Streaming SIMD Extensions 3 SSE4 / SSE4.1 + SSE4.2 / Streaming SIMD Extensions 4 ? AES / Advanced Encryption Standard instructions AVX / Advanced Vector Extensions AVX2 / Advanced Vector Extensions 2.0 AVX-512 / Advanced Vector Extensions 512 BMI / BMI1 + BMI2 / Bit Manipulation instructions Deep Learning Boost F16C / 16-bit Floating-Point conversion instructions FMA3 / 3-operand Fused Multiply-Add instructions SHA / Secure Hash Algorithm extensions EM64T / Extended Memory 64 technology / Intel 64 ? HT / Hyper-Threading technology ? VT-x / Virtualization technology ? TBT 2.0 / Turbo Boost technology 2.0 ? TSX / Transactional Synchronization Extensions

GraalVM defaults requires: SSE3 + SSSE3 + SSE4_1 + SSE4_2 + POPCNT + LZCNT + AVX + AVX2 + BMI1 + BMI2 + FMA

So, POPCNT and LZCNT are not there.

IT can be missing in the list given by the specification, or be really missing.

geoand commented 1 year ago

I think it makes sense but maybe it should be an option the user can define (with compatibility as the default)?

+1

Also, we should definitely have some docs on this

cescoffier commented 1 year ago

So, I just tried with a C6 EC2 instance (the most expensive instance I've ever used) and no problem, the POPCN and LZCNT seem to be there (why not listed).

jerboaa commented 1 year ago

Perhaps as a data point: https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level#architectural_considerations_for_rhel_9 since RHEL 9 chose -march=x86-64-v2

jerboaa commented 1 year ago

Specifically why x86-64-v3 wasn't chosen.

The new server-class CPUs released in 2020 do not implement the AVX instruction set.

cescoffier commented 1 year ago

Thanks @jerboaa, that confirm the need to use "compatibility" as default.

@maxandersen WDYT?

zakkak commented 1 year ago

I think it makes sense but maybe it should be an option the user can define (with compatibility as the default)?

FWIW it should be possible to override (the Quarkus default -march=compatibility) by just passing -Dquarkus.native.additional-build-args=-march=<whatever>. So a Quarkus option might not be necessary.

geoand commented 1 year ago

Certainly not necessary, but it makes things more easier for users (the important thing is that it makes the property discoverable and adds documentation)

On Fri, Jun 23, 2023, 10:11 Foivos @.***> wrote:

I think it makes sense but maybe it should be an option the user can define (with compatibility as the default)?

FWIW it should be possible to override (the Quarkus default -march=compatibility) by just passing -Dquarkus.native.additional-build-args=-march=. So a Quarkus option might not be necessary.

— Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/34238#issuecomment-1603791867, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBMDP3FDIDD76RWJI4GM5DXMU6TFANCNFSM6AAAAAAZPZNXDY . You are receiving this because you commented.Message ID: @.***>

maxandersen commented 1 year ago

also, the flag needs to be handled differently wether pre 23.0 or post 23.0, right ?

so a dedicated native property seems better?, like quarkus.native.arch=auto|compatibility|none or similar so by default we handle what we can detect, (default would be to be as compatible as possible or based on other info) and none would disable it to alow users to use additonal-build-args manually.

maxandersen commented 1 year ago

zakkak commented 1 year ago

also, the flag needs to be handled differently wether pre 23.0 or post 23.0, right ?

We can achieve this either way. Quarkus sets -march only post 23.0 and users are free to pass anything through additonal-build-args (so it's their responsibility to make sure what they pass works with the Mandrel/GraalVM version they use).

so a dedicated native property seems better?, like quarkus.native.arch=auto|compatibility|none

What would be the difference between auto and compatibility?

PS: I am OK with adding another native option, just saying we have an alternative if we want to follow the "try to introduce as little GraalVM/Mandrel-specific native options as possible" rule of thumb.

maxandersen commented 1 year ago

auto was in case over time we have other info available that can be used to set the arch settings. for now that would be we set it to what corresponds to compatiblity.

maxandersen commented 1 year ago

so just to check - which configs have we tried that actually causes issues ?

reading through I only see https://github.com/quarkusio/quarkus-images/issues/241 which might be nonissue ?

The new server-class CPUs released in 2020 do not implement the AVX instruction set.

does these images have the issue or just raised concern? claim from graalvm team is that the new defaults are for 10 year old CPU arch...

maxandersen commented 1 year ago

from graalvm slack chat:

cheatsheet https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

if rhel targets x86-64-v2 and graalvm does x86-64-v3 that could be problematic but maybe defaulting to x86-64-v2 is more sensible as opposed to compatibility ?

anyone with arch expertise able to state what the value add will be? (the list of extra instruction sets native-image gets access to is documented at https://github.com/oracle/graal/blob/f938bface73709d0d962f42361e0deb74b816efc/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/util/CPUTypeAMD64.java#L92)

jerboaa commented 1 year ago

Yeah, perhaps a default of x86-64-v2 over x86-64-v3 would be sensible.

jerboaa commented 1 year ago

https://ark.intel.com/content/www/us/en/ark/products/codename/229610/products-formerly-parker-ridge.html These Atom CPUs (from 2022) apparently don't implement AVX. So the question becomes how common it would be to build on x86-64-v3 capable machines and deploy to one of those Atom CPUs. To err on the safe side, go with x86-64-v2?

kirillp commented 1 year ago

default should be the compiler host

maxandersen commented 1 year ago

@kirillp thats the forseen issue - the compiler host more often runs newer/bigger specs than target host so defaulting to compiler host would result in harder to run images on other chipsets...

maxandersen commented 1 year ago

it would be nice if there was a variation of -march that would just mean "take whatever the build host has"...it couldnt be the default though.

cescoffier commented 1 year ago

It was decided to only document the new parameter for now.

parasjain27031994 commented 10 months ago

Hi @cescoffier, @gsmet , @maxandersen , we faced a similar issue in our production workloads. All of our application are packaged as Native Images and deployed over AWS as Custom Runtime. Since we do not have knowhow of what EC2 type AWS use behind, we saw several of our lambdas failing with the same error.

We did contacted AWS and they mentioned the cause of this is a hardware upgrade at their end, however I think setting default as "march=compatibility" may provide more flexible approach. As the systems which Runs CI may not always match the host, especially when the target machine is not directly controlled.

maxandersen commented 10 months ago

@parasjain27031994 interesting - do you have some more details on it ? what was your fix? to use march=compatability in your builds?

parasjain27031994 commented 10 months ago

Hi @maxandersen, currently we do not have a fix. For now AWS support disabled hardware update for our account, since we identified that on older hardware the application was running fine. I am waiting for further information from them about specifications of the new hardware.

I cannot reproduce this as not all of the lambdas initialization were landing on new hardware as some requests were working fine, I will update here once I have further information.

Just for information, the lambdas that were failing were using Quarkus 3.5.2 and built on Mandrel 23.0

jerboaa commented 10 months ago

FWIW https://quarkus.io/guides/native-reference#work-around-missing-cpu-features was added for this issue.

parasjain27031994 commented 10 months ago

Hi @jerboaa , yes I did come across this, what was amusing to us was, the Lambdas, which were failing were deployed long back and started failing recently w/o any deployment, I am just one of many probably who may have faced similar issue, if we probably default to compatibility mode, perhaps we would have less chances of such failures.

As a side note, we have added the compatibility flag now to all of our Lambdas ;-)

maxandersen commented 9 months ago

I just hit this using github actions to build osx binaries. I now get this:

~/Downloads/hassq-1.0.0-SNAPSHOT The current machine does not support all of the following CPU features that are required by the image: [CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, LZCNT, AVX, AVX2, BMI1, BMI2, FMA]. Please rebuild the executable with an appropriate setting of the -march option.%

im on a OSX M1 so a bit surprised why it gets marked as incompatible.

geoand commented 5 months ago

Is there any reason why we would not include an architecture configuration option in NativeConfig nowadays? I don't see why we need to make users use quarkus.native.additional-build-args for this one.

maxandersen commented 5 months ago

+1 from me - makes sense to expose it more explicitly.

geoand commented 5 months ago

quarkusio / quarkus

Should Quarkus set the `march` parameter by default? #34238

40594 takes care of it