haproxy / haproxy

HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
https://git.haproxy.org/
Other
4.65k stars 772 forks source link

QUIC Options now that OpenSSL 1.1.1 is End of Life #2294

Closed cmason3 closed 9 months ago

cmason3 commented 10 months ago

Your Feature Request

What is the recommendation for QUIC support now that OpenSSL 1.1.1 has reached end of life. I believe HAProxy's recommendation is to build HAProxy with QuicTLS 1.1.1 as OpenSSL 3.x is absolute pants.

Now OpenSSL is EOL it means it will no longer receive any security patches, so people manually building it against HAProxy for QUIC (or using the haproxytech QUIC Docker images) will be exposed to any unpatched security issues?

How far away are we from building against a viable alternative or finding a way to make OpenSSL 3.x less pants?

What are you trying to do?

Find a viable option that allows users to continue to use HAProxy with QUIC without being exposed to security issues and without having rubbish performance.

Output of haproxy -vv

N/A
git001 commented 10 months ago

You can see the answer in this blog post https://www.haproxy.com/blog/how-to-enable-quic-load-balancing-on-haproxy

EmericBr commented 10 months ago

Hi,

This was discussed a lot between maintainers to prepare this 1.1.1 EOL.

We still don't have perfect solution but 3 several directions are available:

cmason3 commented 10 months ago

Ok thanks - out of those 3 options, does anyone know at this point which one is going to be adopted for the "haproxytech" QUIC Docker images?

EmericBr commented 10 months ago

I'm relying the question to haproxytech and trying to find the response.

wtarreau commented 10 months ago

I had conducted performance measurements there on the openssl issue:

https://github.com/openssl/openssl/issues/20286#issuecomment-1527869072

When used as a server, openssl 3.1 still requires twice as many servers as openssl 1.1.1. 3.0.8 was twice that again, and 3.0.2 as found in some distros such as ubuntu 22.04 is about 4 times worse. For some use cases of ultra-low load it can be sufficient but this still makes the stack extremely sensitive to trivial DoS attacks.

When used as a server however (i.e. connect to backend servers using SSL) it's still way behind 1.1.1, and if you need to do SSL on the backend, better stay with an LTS distro that ships it (i.e. Ubuntu 20.04 still has a few years of maintenance on 1.1.1 and could do the job; debian 11 apparently will be supported till 2024-2026. RHEL8 is also on 1.1.1 and is maintained for a long time. If I had to choose a solution suitable for production, I would pick something based on one of these distros which will continue to maintain their copy of 1.1.1 and provide bug fixes.

dkorunic commented 10 months ago

@cmason3 We will:

@wtarreau has neatly summarized what is the expected performance reduction and what are the other options, if needed.

wtarreau commented 10 months ago

@dkorunic just out of curiosity, is there any reason for using as a base for the image a distro based on a defective version of openssl instead of sticking to one of those above that still support it ? OpenSSL 1.1.1 has reached EOL only for public releases from the OpenSSL project but it's still supported by these distros. As such I don't understand the need for shooting ourselves in the foot while doing nothing continues to provide a working version. Typically Ubuntu20 is still OK till April 2025 in standard support and even 2030 with extended support (we can hope by then a serious alternative to OpenSSL will have been developed and the project will long be buried, or will have change its organization to start to focus on critical stuff).

dkorunic commented 10 months ago

@wtarreau These older distros (Alpine 3.15, Debian Bullseye, Ubuntu 20.04) have numerous security issues (to name a few: CVE-2023-26604, CVE-2017-11164, CVE-2015-9019, CVE-2013-4235, CVE-2023-29383) that are usually not fixed due to code freeze at the time of distribution release and with mid to low priority these issues are sometimes never resolved, neither upstream nor in the distro. What then happens is that all derived products from images using the mentioned as base images also inherit such issues which are regularly reported by our users of HAProxy CE Docker images as well as HAProxy Kubernetes Ingress Controller and even worse, automated security mechanisms on Docker Hub, Github, Azure, AWS etc. start complaining and/or even rejecting such images. While this security issues are usually non-impacting, being in base distro images means we can't fix them nor remove them and we get them reported over and over and over again -- making it into a huge bother.

wtarreau commented 10 months ago

OK I can understand that, but first, the intersection between CVE and real security issues is usually quite faint so the fact that such distros have not "resolved" them very likely means there were just regular "Curiculum Vitae Enhancers" from newbie "security researchers" seeking instant fame. Also this will not prevent new issues from happening in new distos (even possibly the exact same in fact if some from 2013 and 2017 continue to pop up), and since these images just serve as a base to start a load balancer, I suspect that the vast majority of users would prefer to live with these non-issues rather than see their LB block legitimate traffic at 100% CPU at 1% of the original capacity. Not to mention the expected progressive growth of bug reports related to performance issues that are uniquely caused by OpenSSL 3.x breakage. If it's just advertised "ubuntu 20 inside, CVE-xxx will be reported, use at your own risk" it might do the job. All users of high performance systems continue to stick to such distros for its 1.1.1 support anyway since there's currently no other viable option. Just my two cents.

dkorunic commented 10 months ago

I totally agree that those CVEs are non-impacting and totally unrelated to regular functionality of a single-process running inside a container, but our images even get automatically ejected from container Marketplaces due to CVEs not being fixed. Or we get hammered with users opening issues in the tracker and complaining about CVEs which their security scanning tools report over and over -- it's just a bother to deal with this. Don't get me wrong, I totally agree with you that in high-performance requirement environment OpenSSL 1.1.1 is pretty much the only viable alternative. With containers however, very high performance is not necessarily the primary requirement. Some customers can't even install containers for their Docker or K8s if it doesn't pass Trivy/Clair/Threat Mapper and/or other scanning certification.

chipitsine commented 10 months ago

чт, 14 сент. 2023 г. в 18:42, Dinko Korunic @.***>:

@wtarreau https://github.com/wtarreau These older distros (Alpine 3.15, Debian Bullseye, Ubuntu 20.04) have numerous security issues (to name a few: CVE-2023-26604 https://github.com/advisories/GHSA-8989-8fhv-vq42, CVE-2017-11164 https://github.com/advisories/GHSA-5457-7wx3-gf7j, CVE-2015-9019 https://github.com/advisories/GHSA-8xxg-m548-vx69, CVE-2013-4235 https://github.com/advisories/GHSA-2q3w-h8mm-q9v3, CVE-2023-29383 https://github.com/advisories/GHSA-p9w4-8hh8-crcx) that are usually not fixed due to code freeze at the time of distribution release and with mid to low priority these issues are sometimes never resolved, neither upstream nor in the distro.

  1. CVE-2023-26604 https://github.com/advisories/GHSA-8989-8fhv-vq42 "systemd before 247 does not adequately block local..."

docker images do not have their own systemd. we can ignore that

  1. CVE-2017-11164 https://github.com/advisories/GHSA-5457-7wx3-gf7j "In PCRE 8.41, the OP_KETRMAX feature in the match..."

  2. OP_KETRMAX is not used in HAProxy

  3. CVE-2015-9019 https://github.com/advisories/GHSA-8xxg-m548-vx69 "In libxslt 1.1.29 and earlier, the EXSLT math.random..."

neither libxslt is used

  1. CVE-2013-4235 https://github.com/advisories/GHSA-2q3w-h8mm-q9v3 "shadow: TOCTOU (time-of-check time-of-use) race condition..."

ok, this seems to be relevant, but due to docker nature, I do not see how can this may be exploited

  1. CVE-2023-29383 https://github.com/advisories/GHSA-p9w4-8hh8-crcx "In Shadow 4.13, it is possible to inject control..."

same as previous. it is our own container, no need to exploit ourselves

What then happens is that all derived products from images using the mentioned as base images also inherit such issues which are regularly reported by our users of HAProxy CE Docker images as well as HAProxy Kubernetes Ingress Controller and even worse, automated security mechanisms on Docker Hub, Github, Azure, AWS etc. start complaining and/or even rejecting such images. While this security issues are usually non-impacting, being in base distro images means we can't fix them nor remove them and we get them reported over and over and over again -- making it into a huge bother.

— Reply to this email directly, view it on GitHub https://github.com/haproxy/haproxy/issues/2294#issuecomment-1719792002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ5KUF7YO5Z25EH5PAMOFTX2MXVZANCNFSM6AAAAAA4XUXAZY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chipitsine commented 10 months ago

чт, 14 сент. 2023 г. в 19:12, Dinko Korunic @.***>:

I totally agree that those CVEs are non-impacting and totally unrelated to regular functionality of a single-process running inside a container, but our images even get automatically ejected from container Marketplaces due to CVEs not being fixed. Or we get hammered with users opening issues in the tracker and complaining about CVEs which their security scanning tools report over and over -- it's just a bother to deal with this. Don't get me wrong, I totally agree with you that in high-performance requirement environment OpenSSL 1.1.1 is pretty much the only viable alternative. With containers however, very high performance is not necessarily the primary requirement. Some customers can't even install containers for their Docker or K8s if it doesn't pass Trivy/Clair/Threat Mapper and/or other scanning certification.

can this be handled on community basis, like moving Dockerfile(s) somewhere to https://github.com/haproxy/ and let people maintain as many flavours as they can handle (some of them on 3.1.x, some on 1.1.1)

— Reply to this email directly, view it on GitHub https://github.com/haproxy/haproxy/issues/2294#issuecomment-1719842276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ5KUAHMZJDYZJHHM4XFODX2M3I7ANCNFSM6AAAAAA4XUXAZY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dkorunic commented 10 months ago

@chipitsine I'll avoid answering line per line, but to summarise -- I have processed and checked dozens of such reports for both community and enterprise products of ours and having a strong secops background I well know that they are non-impacting, as mentioned before. But it doesn't matter what you and I think, it's irrelevant since automated systems will report and block images with such issues in their SBOM. It doesn't matter if such issues cannot be exploited, it doesn't matter if they are non-impacting, it doesn't matter if it's a single process that runs inside of an isolated environment, like mentioned before in the thread. These issues get reported, over and over, container images get flagged and eventually blocked from downloads. The only way to fix this is to move to more recent distro which doesn't have such flagged versions of software in its SBOM.

wtarreau commented 10 months ago

But it doesn't matter what you and I think, it's irrelevant since automated systems will report and block images with such issues in their SBOM

Yeah that's definitely the problem with CVE nowadays, it's a totally bogus metric that encourages everyone to run insecure code that has all dummy CVE fixed but not the real security fixes. That's a sad state of affairs but a reality, and unfortunately there are so many dishonest companies making a living of this scam that it's almost impossible to combat it.

One thing is that openssl 3.x on the backend is not even a matter of "high performance" at all. It almost does not work. OpenSSL 3 as a client is just sufficient to run wget and curl in a script, but not usable for anything looking like a daemon. That's why I was asking. I understand your point though, I didn't know you got those absurd reports. What's sad is that we don't even need a full-fledged distro for docker images, even the tiny formilux distro we have in the ALOHA is way more than needed, the problem is that it suddenly requires that someone is in charge of these shitty packages.

Let's see how it goes with 3.x then. I'm really sad to see that users will start to imagine that it's haproxy's fault if everything is super slow, but there's hardly anything we can do about it. Well, at least please build with USE_PTHREAD_EMULATION=1 so that the process can be analysed with "perf top", as there's nothing worse than a process doing nothing and not progressing, as it will not even report its time in flamegraphs. If at least users run perf top and see 92% in locking then produce a flame graph and see openssl take all the screen, they will ask the relevant questions and we'll all save time on the diags. With a bit of luck, wolfssl will be ready in time so that this situation doesn't last long :-(

wtarreau commented 10 months ago

BTW it's another indication that we really need to work with aws-lc to find a solution regarding the few missing symbols, because they do have a supported stack which implements the relevant QUIC API, it's just that it's missing symbols that are apparently not trivial to reimplement :-/

dkorunic commented 10 months ago

One thing is that openssl 3.x on the backend is not even a matter of "high performance" at all. It almost does not work. OpenSSL 3 as a client is just sufficient to run wget and curl in a script, but not usable for anything looking like a daemon. That's why I was asking. I understand your point though, I didn't know you got those absurd reports. What's sad is that we don't even need a full-fledged distro for docker images, even the tiny formilux distro we have in the ALOHA is way more than needed, the problem is that it suddenly requires that someone is in charge of these shitty packages.

Yes, in fact pretty much every Docker Registry out there is also scanning for known vulnerabilities (Docker Hub, Quay) including commercial Marketplaces (AWS, Azure...) and we can't dismiss and/or mark some issues as irrelevant, they pile up and cause users to keep opening issues. Especially jumpy are Docker and K8s users, since community versions permeate heavily in K8s world through Kubernetes Ingress Controller.

Let's see how it goes with 3.x then. I'm really sad to see that users will start to imagine that it's haproxy's fault if everything is super slow, but there's hardly anything we can do about it. Well, at least please build with USE_PTHREAD_EMULATION=1 so that the process can be analysed with "perf top", as there's nothing worse than a process doing nothing and not progressing, as it will not even report its time in flamegraphs. If at least users run perf top and see 92% in locking then produce a flame graph and see openssl take all the screen, they will ask the relevant questions and we'll all save time on the diags. With a bit of luck, wolfssl will be ready in time so that this situation doesn't last long :-(

Great recommendation! I've already did this change since your note and all the community images are being rebuilt as we speak, some ~240 images due to differences with base image, CPU architecture, code branch and quic features.

lukastribus commented 10 months ago

It's great that we have aws-lc. It's likely that AWS has a short/medium term business interest in keeping haproxy working with aws-lc.

But that is just a single project of a single corporation. If tomorrow AWS decides that haproxy is no longer a interest, the API can likely fluctuate exactly like boringssl currently does, if aws-lc continues to be supported at all.

Whack-a-mole with unstable API's is not a permanent solution to this problem.

1.1.1 hyperscaler forks are not a permanent solution to this problem.

No 1.1.1 forks are a permanent solution to this problem.

I also think it's wishful thinking that 4.0.0 (throwing a random number out there) will solve all problems (all performance issues are resolved and we get a stable QUIC API), so that we can just wait it out.

I'm not sure how much real industry interest there still is to support continuous and real development of openssl, considering where we are right now.

Clearly for the hyperscalers it's cheaper to develop their own fork (see boringssl, aws-lc, etc), than to send their code upstream and have it bikesheeded away.

The best technical decision probably is to work with someone like wolfssl to improve the actual situation, which haproxy already partially did. But considering that we are still having this discussion, it probably requires a lot more than that.

The best business decision is probably to rely on workarounds and forks, until someone else does all the hard work, or pushes someone to do all the hard work.

So I guess it becomes a waiting game.

wtarreau commented 10 months ago

My understanding is that aws-lc is a bit more than just a fork, they're cherry-picking from various projects and doing their own stuff as well. Apparently they're interested in formal verification, FIPS certification and forward-compatible API. Of course, only time will tell if this works, but at least it's done by a company having enough money to throw sufficient permanent developers at this task for the sake of saving CPU cycles on millions of instances, so we can trust their intent and their ability to execute at least. They're not focusing on haproxy but more generally applications that rely on openssl.

And don't get me wrong, I, too, consider wolfssl the best technical solution, if only for the performance and number of locks per connection shown at the link above (691 locks for openssl 3.0, 220 locks for 3.1, 1 lock for wolfssl, no comments!). However we know there's still a bit of work to be done on wolfssl to make sure it reaches distros in a state that is compatible with sufficient applications with decent performance. At the moment, wolfssl has to be configured for one application and one platform. For instance if you enable aes-ni and avx, it will not work on a lower-end x86 machine and that's not acceptable for distros. Disabling all arch-specific optimizations will probably be the solution for distros, but it will lead to much less appealing performance. Thus there's still definitely quite some work to do there.

Thus I find it important that we don't put again all our eggs in the same basket. We've all been suffering from openssl's hegemony and somewhat "original" governance. It seems important to me to give a chance to at least two promising projects and not depend on a single of them.

Finally, a last point that is important to me is that wolfssl is a much smaller company than aws, and I hope their work results in something viable for them on the long term. I imagine that it's more difficult to generate long-term revenue from distributing a library, especially once it starts to be adopted everywhere and becomes mainstream, and they'll have to find a difficult balance between making it totally plug-and-play for distros and selling contracts to get the most of it. It's not my business to look into this, but it's my concern that we take care of not putting them into a difficult situation so that they can continue to deliver this high-quality library for free for as long as possible. For now they look very open but maybe one day we'll be asking too much and they won't want to follow, and we'll have to respect that.

cmason3 commented 10 months ago

Thank you for the detailed explanation of where we are with this - I have come across the same issue in my line of work with CVEs - it doesn't matter if you can show you aren't impacted - if the release is exposed it is deemed an issue.

I can see the Docker images have been updated now to use QuicTLS 3.x (thank you @dkorunic) - my use-case is very low volumes of traffic so I don't expect to see any issues.

Tristan971 commented 9 months ago

For what it’s worth, usage of SSL on backends is quite rare in those most-cve-jumpy-environments like k8s, because it often doesn’t make a lot of sense, and even more rarely does it also need high performance there.

The most common use-case is when a target service wants to use mTLS (security-critical products) or some weirdo « SSL » (like databases), at which point you’re likely using TCP mode anyway. Or just bypassing HAProxy altogether since k8s already manages VIPs inside the cluster as an option for service-to-service comms (and thus going via haproxy internally might make sense for convenience, to add headers for example, but is then necessarily an extra network hop so you already didn't care about performance THAT much).

That said the ideal long term solution is rather being able to build a fully statically linked version of HAProxy, ie with musl for example (despite its other flaws), because then you can make so-called distroless docker images (ie just a static binary in there, no OS) which avoid needing to worry about all this. Especially as in the case of containers the security argument for dynamic linking is totally irrelevant, since you’re not running distro updates inside a container.

wtarreau commented 9 months ago

FWIW I've added a new page in the wiki here to summarize the current state of SSL libraries support: https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-Status

cmason3 commented 9 months ago

Closed as we now have a wiki page explaining the status.

Tristan971 commented 1 month ago

Unfortunately, we might have to reopen this issue:

It seems to me that while QuicTLS 1.1.1 is still the "best" way right now, it won't be for much longer as soon as a really bad issue is found OpenSSL...

So some changes will be needed in HAProxy, whether to accomodate QuicTLS 3.2+ (if it happens at all), or by way of moving harder to other libs.

Thoughts?

chipitsine commented 1 month ago

Unfortunately, we might have to reopen this issue:

  • OpenSSL 1.1.1 is EOL, and as much as we wanted to delay it, we'll have to move
  • OpenSSL 3.0/3.1 (and by extension QuicTLS) are problematic performance-wise
  • QuicTLS 3.2/3.3 looks like it may or may not happen, because upstream OpenSSL 3.2 has begun adding some of their QUIC code, which conflicts with the QuicTLS patchset... see: Sorting out our plans for 3.2 quictls/openssl#138
  • LibreSSL/WolfSSL/AWS-LC are all still in the "you should probably not use" range, if we look at the issue tracker and the various issues that still crop up regularly about them

It seems to me that while QuicTLS 1.1.1 is still the "best" way right now, it won't be for much longer as soon as a really bad issue is found OpenSSL...

So some changes will be needed in HAProxy, whether to accomodate QuicTLS 3.2+ (if it happens at all), or by way of moving harder to other libs.

Thoughts?

I came to an idea to use QUIC Interop suite to make parity between QuicTLS and other libs. it was very time consuming to compare logs (even those logs are not easy to compare).

thanks to Amaury some corner cases were fixed: https://github.com/haproxy/haproxy/issues/2418 in theory we can get back to using Quic Interop and fix remnants

Tristan971 commented 1 month ago

Oh very nice! Somehow I hadn't seen the issue

chipitsine commented 1 month ago

@Tristan971 if you have an appetite for that, we can choose some of LibreSSL/WolfSSL/AWS-LC and give it a try.

I've contributed LibreSSL Interop dockerfile: https://github.com/haproxytech/haproxy-qns/tree/libressl I have others locally.

Tristan971 commented 1 month ago

if you have an appetite for that

Yeah I meant to try WolfSSL once it had roughly feature parity (for the features I use anyway; was missing OCSP last I checked). I also need to check that it returns the same values as OpenSSL everywhere (for JA3 stuff for example).

chipitsine commented 1 month ago

https://github.com/chipitsine/dac44b2a-5f8f-433b-9481-ae7207a99010/tree/main/asan/haproxy-qns-wolfssl

that is Iinterop image

Tristan971 commented 1 month ago

Okay, I'll take a look at it and try to match it in my builds when the OCSP support is fixed. Then I will report back :+1:

wlallemand commented 1 month ago

I hope WolfSSL will fix their issues with their OpenSSL compatibility layer, but since they are not an OpenSSL fork there is a lot of work to do, and it's difficult to test everything on our side too.

My best hope is that OpenSSL fixes their issues but that could take some time... People are still able to use OpenSSL 1.1.1 with the openssl-compat layer for QUIC till it's supported in LTS distributions, but for distributions that does not embed this version it's kind of complicated. Note that the HAProxy Technologies company is stuck with this version and is selling product with the 1.1.1 LTS premium from openssl for now.

The new library that gives me the best hope for now is aws-lc, since they are a fork of boringssl and openssl and are aiming stability of the API, FIPS, forrmal verification, as well as performance.

But we still don't have a satisfying solution for now.

Tristan971 commented 1 month ago

Yeah it's a tricky situation. I mean if you (as the authority for SSL in HAProxy, as far as I know) believe AWS-LC is the future, then I'd switch for that instead.

I kept looking at WolfSSL because from Willy's words in the past they seemed to have the cleanest/fastest implementation.

Admittedly AWS-LC has the benefit long-term that it's partially using BoringSSL and thus benefitting from Google's work for Go's network stack, which is appealing I have to say.

It's a bit off-topic, but they have just added client-side support for ECH in Go, and preparing for server-side, so I guess that would trickle down faster. Then again WolfSSL implemented ECH something like one year ago already so...

wlallemand commented 1 month ago

Yeah it's a tricky situation. I mean if you (as the authority for SSL in HAProxy, as far as I know) believe AWS-LC is the future, then I'd switch for that instead.

I can't say which one is really the future honestly, we need to be pragmatic for now, and make efforts to be compatible with solutions that could have a future.

I plan to make more test with aws-lc in the following weeks to see what is missing, and continue to help debugging things with wolfssl, I also made some tests with rustls-openssl-compat... And we also need to help OpenSSL improving their performances.

I kept looking at WolfSSL because from Willy's words in the past they seemed to have the cleanest/fastest implementation.

I agree that WolfSSL has a faster implementation than OpenSSL, however it's not cleaner and not as complete as openssl. But it can be enough for a lot of usecases.

Admittedly AWS-LC has the benefit long-term that it's partially using BoringSSL and thus benefitting from Google's work for Go's network stack, which is appealing I have to say.

Honestly that's a whole new project now, and it is server-side oriented, where boringssl started in clients.

It's a bit off-topic, but they have just added client-side support for ECH in Go, and preparing for server-side, so I guess that would trickle down faster. Then again WolfSSL implemented ECH something like one year ago already so...

I have no idea about ECH, and I didn't test the wolfssl implementation either, ECH is still a draft so I'm not that surprised that it's not implemented everywhere.

Tristan971 commented 1 month ago

I can't say I know enough to accurately judge myself between them, but if everyone is unhappy with OpenSSL, there needs to be a rally behind a clear number 2 or 3. Otherwise adoption is unlikely to change over time, and more cases like their one about QUIC will be things people just have to put up with. Everything always ends up converging on a small number of big players, because the opposite is way too time-inefficient for everyone else...

And we haven't even reached the point of having to wire up their official QUIC API when it is ready, which (to my understanding) is not going to follow the general approach all other libs took until now.

But either way:

ECH is still a draft so I'm not that surprised that it's not implemented everywhere.

Yeah it's not surprising, but I meant it in that it will always be faster if you can mutualise some things. BoringSSL will always have the Google engineers building every bleeding edge spec early because they need it for field testing in Chrome(ium) in the first place. So that is a somewhat guarantee that in most cases most of the work will have been done, even if from a client standpoint only.

(also while it's still technically a draft, it might as well be a published spec at this point, barring major news)

centminmod commented 1 month ago

Been quietly following this discussion. You mentioned AWS-LC, they posted their CI integration scripts for open source software they are looking at AWS-LC supporting and seems haproxy is listed there too https://github.com/aws/aws-lc/tree/main/tests/ci/integration. Specific script https://github.com/aws/aws-lc/blob/main/tests/ci/integration/run_haproxy_integration.sh

Tristan971 commented 1 month ago

That is really good news. If they intend to have AWS-LC be as good an SSL library as they do some of their other core libraries/runtimes (huge fan of the work of the Corretto team), they might very well become what I hope for.

wtarreau commented 1 month ago

aws-lc is indeed quite fast. Their intent is to provide a fast alternative usable for production. For example on ARM, RSA calculations are twice as fast as other implementations because they've rewritten that part using carefully scheduled asm instructions that make use of both integers and neon in parallel. There used to be some serious limitations there at the beginning, regarding 0-rtt, some ciphers that were not accessible and maybe some things related to the client handshake, that I don't remember clearly (since I don't understand much about the interactions between all these things), but many of them were either addressed or worked around one way or another.

what I mentioned regarding wolfssl being the fastest is that wolfssl is almost lockless. For a complete connection there was something like 691 locks on openssl-1.1.1 vs only 1 for wolfssl. Openssl 3+ removed many of these locks, but the wrong way, by placing tons of atomics in critical sections and making the situation even worse (and even less observable).

centminmod commented 1 month ago

I just added AWS-LC support to my Nginx builds using AWS-LC Nginx patches. But thought I'd test AWS-LC vs BoringSSL. One bonus if you do go BoringSSL variant route is you may also pick up Cloudflare Post-Quantum Key Agreement support with their Cloudflare edge server's connection to BoringSSL enabled origin servers :)

AWS-LC vs BoringSSL Benchmark Comparison

RSA Benchmarks

Operation AWS-LC (ops/sec) BoringSSL (ops/sec)
RSA 2048 signing 2208.2 2206.0
RSA 2048 verify (same key) 83129.6 82680.2
RSA 2048 verify (fresh key) 72081.3 71410.1
RSA 2048 private key parse 8523.8 11790.6
RSA 3072 signing 717.8 -
RSA 3072 verify (same key) 39200.0 -
RSA 3072 verify (fresh key) 34969.9 -
RSA 3072 private key parse 4461.6 -
RSA 4096 signing 318.4 319.5
RSA 4096 verify (same key) 22611.6 22716.6
RSA 4096 verify (fresh key) 20242.6 20162.8
RSA 4096 private key parse 2620.4 3799.0
RSA 8192 signing 43.0 -
RSA 8192 verify (same key) 5807.4 -
RSA 8192 verify (fresh key) 5306.5 -
RSA 8192 private key parse 685.3 -

ECDSA Benchmarks

Operation AWS-LC (ops/sec) BoringSSL (ops/sec)
ECDSA P-224 signing 29803.0 38590.5
ECDSA P-224 verify 13013.7 18126.0
ECDSA P-256 signing 74143.2 75808.7
ECDSA P-256 verify 25920.0 26009.3
ECDSA P-384 signing 13162.7 2487.2
ECDSA P-384 verify 5986.2 2462.3
ECDSA P-521 signing 6943.1 1008.5
ECDSA P-521 verify 3392.4 999.5
ECDSA secp256k1 signing 4800.5 -
ECDSA secp256k1 verify 4915.8 -

X25519 Benchmarks

Operation AWS-LC (ops/sec) BoringSSL (ops/sec)
EVP ECDH X25519 39674.5 -
Ed25519 key generation 166466.0 33784.5
Ed25519 signing 160186.9 33357.3
Ed25519 verify 34790.6 25754.6
Curve25519 base-point multiplication 177951.1 34105.6
Curve25519 arbitrary point multiplication 52145.9 32838.3
ECDH X25519 40235.8 -

P-256 Benchmarks

Operation AWS-LC (ops/sec) BoringSSL (ops/sec)
ECDH P-256 24138.9 24367.0
ECDSA P-256 signing 74128.2 76704.6
ECDSA P-256 verify 25952.2 26522.0
Generate P-256 with EVP_PKEY_keygen 167264.0 -
Generate P-256 with EC_KEY_generate_key 168168.7 -
EVP ECDH P-256 23759.3 -
EC POINT P-256 dbl 10712707.1 -
EC POINT P-256 add 6189157.2 -
EC POINT P-256 mul 30706.9 -
EC POINT P-256 mul base 175588.2 -
EC POINT P-256 mul public 26097.0 -
Generate P-256 with EC_KEY_generate_key_fips 14835.3 -

on 8 CPU KVM VPS

lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           8
NUMA node(s):        1
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Red Hat
CPU family:          25
Model:               33
Model name:          AMD Ryzen 9 5950X 16-Core Processor
BIOS Model name:     RHEL 7.6.0 PC (i440FX + PIIX, 1996)
Stepping:            0
CPU MHz:             3393.624
BogoMIPS:            6787.24
Virtualization:      AMD-V
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid fsrm arch_capabilities
chipitsine commented 1 month ago

I would suggest adding some "feature" table somewhere on wiki. for example, LibreSSL does not support 0-rtt (and does not wish), same applies to openssl-compat mode.

such a table will help people to make a decision whether some lib is good for them

wlallemand commented 1 month ago

I started to write one but it's difficult to test everything without having the reg-tests working on all libraries, that's a work in progress. But I agree with you, that would be interesting.

chipitsine commented 1 month ago

Been quietly following this discussion. You mentioned AWS-LC, they posted their CI integration scripts for open source software they are looking at AWS-LC supporting and seems haproxy is listed there too https://github.com/aws/aws-lc/tree/main/tests/ci/integration. Specific script https://github.com/aws/aws-lc/blob/main/tests/ci/integration/run_haproxy_integration.sh

aws-lc runs integration on their side. however (with the help of aws-lc developers) aws-lc was added to haproxy CI: https://github.com/haproxy/haproxy/actions/runs/9452430068/job/26036597288

aws-lc is covered with testing in the same way as openssl

unfortunately, test suite does not cover QUIC yet (that's true for all ssl variants). QUIC is covered with QUIC Interop, however it does not run on git push

chipitsine commented 1 month ago

I started to write one but it's difficult to test everything without having the reg-tests working on all libraries, that's a work in progress. But I agree with you, that would be interesting.

https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-Status ?

I'll try to add table

wlallemand commented 1 month ago

I added a TL;DR section https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-Status#tldr.

But don't expect to have details about what is supported or not, it's a colossal work to test every keyword and every feature of every SSL libraries, we have some reg-tests but honestly that's not enough for that. I would require tests much more advanced and it's a colossal work to have details about every small features, ciphers, signature algorithms etc....