Closed cescoffier closed 1 year ago
I think it makes sense but maybe it should be an option the user can define (with compatibility
as the default)?
Seems to be the good choice.
Do we know what causes it - must be that its build on a newer architecture but the EC2 instances chosen is using the older architecture?
So, I need to do more checks, but is we consider the C6 EC2 instance type, it uses an Ice Lake processor which provides:
MMX instructions SSE / Streaming SIMD Extensions SSE2 / Streaming SIMD Extensions 2 SSE3 / Streaming SIMD Extensions 3 SSSE3 / Supplemental Streaming SIMD Extensions 3 SSE4 / SSE4.1 + SSE4.2 / Streaming SIMD Extensions 4 ? AES / Advanced Encryption Standard instructions AVX / Advanced Vector Extensions AVX2 / Advanced Vector Extensions 2.0 AVX-512 / Advanced Vector Extensions 512 BMI / BMI1 + BMI2 / Bit Manipulation instructions Deep Learning Boost F16C / 16-bit Floating-Point conversion instructions FMA3 / 3-operand Fused Multiply-Add instructions SHA / Secure Hash Algorithm extensions EM64T / Extended Memory 64 technology / Intel 64 ? HT / Hyper-Threading technology ? VT-x / Virtualization technology ? TBT 2.0 / Turbo Boost technology 2.0 ? TSX / Transactional Synchronization Extensions
GraalVM defaults requires: SSE3 + SSSE3 + SSE4_1 + SSE4_2 + POPCNT + LZCNT + AVX + AVX2 + BMI1 + BMI2 + FMA
So, POPCNT and LZCNT are not there.
IT can be missing in the list given by the specification, or be really missing.
I think it makes sense but maybe it should be an option the user can define (with compatibility as the default)?
+1
Also, we should definitely have some docs on this
So, I just tried with a C6 EC2 instance (the most expensive instance I've ever used) and no problem, the POPCN and LZCNT seem to be there (why not listed).
Perhaps as a data point: https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level#architectural_considerations_for_rhel_9 since RHEL 9 chose -march=x86-64-v2
Specifically why x86-64-v3
wasn't chosen.
The new server-class CPUs released in 2020 do not implement the AVX instruction set.
Thanks @jerboaa, that confirm the need to use "compatibility" as default.
@maxandersen WDYT?
I think it makes sense but maybe it should be an option the user can define (with
compatibility
as the default)?
FWIW it should be possible to override (the Quarkus default -march=compatibility
) by just passing -Dquarkus.native.additional-build-args=-march=<whatever>
. So a Quarkus option might not be necessary.
Certainly not necessary, but it makes things more easier for users (the important thing is that it makes the property discoverable and adds documentation)
On Fri, Jun 23, 2023, 10:11 Foivos @.***> wrote:
I think it makes sense but maybe it should be an option the user can define (with compatibility as the default)?
FWIW it should be possible to override (the Quarkus default -march=compatibility) by just passing -Dquarkus.native.additional-build-args=-march=
. So a Quarkus option might not be necessary. — Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/34238#issuecomment-1603791867, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBMDP3FDIDD76RWJI4GM5DXMU6TFANCNFSM6AAAAAAZPZNXDY . You are receiving this because you commented.Message ID: @.***>
also, the flag needs to be handled differently wether pre 23.0 or post 23.0, right ?
so a dedicated native property seems better?, like quarkus.native.arch=auto|compatibility|none
or similar so by default we handle what we can detect, (default would be to be as compatible as possible or based on other info) and none would disable it to alow users to use additonal-build-args manually.
also, the flag needs to be handled differently wether pre 23.0 or post 23.0, right ?
We can achieve this either way. Quarkus sets -march
only post 23.0 and users are free to pass anything through additonal-build-args
(so it's their responsibility to make sure what they pass works with the Mandrel/GraalVM version they use).
so a dedicated native property seems better?, like quarkus.native.arch=auto|compatibility|none
What would be the difference between auto
and compatibility
?
PS: I am OK with adding another native option, just saying we have an alternative if we want to follow the "try to introduce as little GraalVM/Mandrel-specific native options as possible" rule of thumb.
auto was in case over time we have other info available that can be used to set the arch settings. for now that would be we set it to what corresponds to compatiblity.
so just to check - which configs have we tried that actually causes issues ?
reading through I only see https://github.com/quarkusio/quarkus-images/issues/241 which might be nonissue ?
The new server-class CPUs released in 2020 do not implement the AVX instruction set.
does these images have the issue or just raised concern? claim from graalvm team is that the new defaults are for 10 year old CPU arch...
from graalvm slack chat:
cheatsheet https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
if rhel targets x86-64-v2
and graalvm does x86-64-v3
that could be problematic but maybe defaulting to x86-64-v2
is more sensible as opposed to compatibility
?
anyone with arch expertise able to state what the value add will be? (the list of extra instruction sets native-image gets access to is documented at https://github.com/oracle/graal/blob/f938bface73709d0d962f42361e0deb74b816efc/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/util/CPUTypeAMD64.java#L92)
Yeah, perhaps a default of x86-64-v2
over x86-64-v3
would be sensible.
https://ark.intel.com/content/www/us/en/ark/products/codename/229610/products-formerly-parker-ridge.html These Atom CPUs (from 2022) apparently don't implement AVX. So the question becomes how common it would be to build on x86-64-v3
capable machines and deploy to one of those Atom CPUs. To err on the safe side, go with x86-64-v2
?
default should be the compiler host
@kirillp thats the forseen issue - the compiler host more often runs newer/bigger specs than target host so defaulting to compiler host would result in harder to run images on other chipsets...
it would be nice if there was a variation of -march that would just mean "take whatever the build host has"...it couldnt be the default though.
It was decided to only document the new parameter for now.
Hi @cescoffier, @gsmet , @maxandersen , we faced a similar issue in our production workloads. All of our application are packaged as Native Images and deployed over AWS as Custom Runtime. Since we do not have knowhow of what EC2 type AWS use behind, we saw several of our lambdas failing with the same error.
We did contacted AWS and they mentioned the cause of this is a hardware upgrade at their end, however I think setting default as "march=compatibility" may provide more flexible approach. As the systems which Runs CI may not always match the host, especially when the target machine is not directly controlled.
@parasjain27031994 interesting - do you have some more details on it ? what was your fix? to use march=compatability in your builds?
Hi @maxandersen, currently we do not have a fix. For now AWS support disabled hardware update for our account, since we identified that on older hardware the application was running fine. I am waiting for further information from them about specifications of the new hardware.
I cannot reproduce this as not all of the lambdas initialization were landing on new hardware as some requests were working fine, I will update here once I have further information.
Just for information, the lambdas that were failing were using Quarkus 3.5.2 and built on Mandrel 23.0
FWIW https://quarkus.io/guides/native-reference#work-around-missing-cpu-features was added for this issue.
Hi @jerboaa , yes I did come across this, what was amusing to us was, the Lambdas, which were failing were deployed long back and started failing recently w/o any deployment, I am just one of many probably who may have faced similar issue, if we probably default to compatibility mode, perhaps we would have less chances of such failures.
As a side note, we have added the compatibility flag now to all of our Lambdas ;-)
I just hit this using github actions to build osx binaries. I now get this:
~/Downloads/hassq-1.0.0-SNAPSHOT The current machine does not support all of the following CPU features that are required by the image: [CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, LZCNT, AVX, AVX2, BMI1, BMI2, FMA]. Please rebuild the executable with an appropriate setting of the -march option.%
im on a OSX M1 so a bit surprised why it gets marked as incompatible.
Is there any reason why we would not include an architecture configuration option in NativeConfig
nowadays?
I don't see why we need to make users use quarkus.native.additional-build-args
for this one.
+1 from me - makes sense to expose it more explicitly.
Started GraalVM 23.0, there is a new parameter when building on AMD64 machines:
Building a container on one machine and running it on another (EC2...) can lead to issues like: https://github.com/quarkusio/quarkus-images/issues/241.
An "ok" workaround could be to set
march
tocompability
when building on AMD64 and add a note about that in the native compilation reference guide.