Closed rvagg closed 5 years ago
This probably accounts for some of it: https://github.com/nodejs/node/pull/16130#issuecomment-338822333 I filed https://github.com/openssl/openssl/issues/4574 for the immediate trigger, but I think Node's reentrancy is also somewhat questionably supportable.
Other bits I suspect come out of them enabling TLS 1.3 by default and the tests being sensitive to there not being a version beyond TLS 1.2.
(But I haven't triaged and am mostly speculating.)
I'm now working on build bindings with 1.1.1-pre1 and will look at test errors later.
Here is my preliminary work to support OpenSSL-1.1.1 on only Linux and Mac. https://github.com/shigeki/node/tree/openssl111pre1_unix_mac_ok I can build node binary but several tests are still failed as described above.
Windows is not yet supported because OpenSSL-1.1.1 generates makefiles to work only nmake
for Windows and we have to some integrations to gmake
. It needs some time to work on Windows.
pre2 is out, looks like a ton of pending refactoring and removals got merged in for this one https://github.com/openssl/openssl/commits/master I haven't tested this yet
OpenSSL-1.1.1-pre1 binding without asm support of https://github.com/shigeki/node/tree/WIP_openssl111pre_noasm has almost been done.
Most of CI results of https://ci.nodejs.org/job/node-test-commit/16544/ are fine except in some platforms. I'm going to work on them with pre2 soon.
Most of test failures is due to cipher suite incompatibilities between TLS1.2 and TLS1.3 and it can be resolved by limiting to use TLS1.2 by default. The issue is now discussing in https://github.com/openssl/openssl/issues/5359.
Very nice work @shigeki. I notice there's a failure on testing 1.0.2 support:
../src/node_crypto.cc: In static member function 'static void node::crypto::SecureContext::Init(const v8::FunctionCallbackInfo<v8::Value>&)':
19:03:36 ../src/node_crypto.cc:636:49: error: 'SSL_CTX_set_max_proto_version' was not declared in this scope
19:03:36 SSL_CTX_set_max_proto_version(sc->ctx_, TLS1_2_VERSION);
Is it going to be hard to main backward compatibility?
It is the API which is available above 1.1.0 so that
#if OPENSSL_VERSION_NUMBER >= 0x10100000L .. #endif
would be working.
So, we are at pre9 now, TLS 1.3 is finalized and ratified as an RFC, and Node 10 LTS is only a month(ish) away. It would be nice to have 1.1.1 in Node 10 before we go LTS but if we have to do it after then so be it. I have a 1.1.1-pre test job for Linux in CI that I'd like to enable so we can start getting ready but we don't quite have compatibility.
@shigeki the commit at the head of upgrade_openssl111_pre6_tls12_only
, https://github.com/shigeki/node/commit/26388feb760c42d5a342a08507f13a3a268918da, we get most of the way there for OpenSSL 1.1.1 as a shared library. It still fails test-tls-handshake-error
, test-tls-set-ciphers-error
, test-tls-server-verify
and test-tls-connect
.
Any idea whether these are easy to overcome and what the impact on non-Linux is at the moment? CI can only do dynamic compiles on Linux at the moment but it's a start and we could use it to make a basic assertion that 1.1.1 isn't going to be a breaking upgrade. I don't know if other collaborators would accept a PR with support for a -pre library but I wouldn't mind trying soon if we can.
OpenSSL 1.1.1 final is here -> https://www.openssl.org/blog/blog/2018/09/11/release111/
Here is my prototype on upgrading latest OpenSSL-1.1.1 release with only TLS1.2 support not TLS1.3.
https://github.com/shigeki/node/tree/WIP_upgrade_openssl111_tls12_only
This has one new feature to have options.min_version/max_version
to limit the use of TLS versions.
Onetest-tls-server-verify
test was failed due to client auth error during renegotiation that need to be investigated.
This is only working on Linux. TODO is to support multi platforms especially on MacOS, Win and s390 that can be checked through CI. I think that it does not have any breaking updates as long as it supports only up to TLS1.2.
Supporting TLS1.3 in Node is an another story. It needs much more works and would have breaking updates because of changes of info callbacks of its TLS state machine and deprecated/new several features for TLS1.3 in OpenSSL-1.1.1.
The failure of test-tls-server-verify
was identified and an issue was submitted in https://github.com/openssl/openssl/issues/7199.
@shigeki it sounds like from openssl/openssl#7199 that we may have some fundamental architecture problems with our TLS layer. It sounds like the TLS 1.3 support problem is related to the test-tls-server-verify
failure too, is that right? Does that mean that we're going to have a lot more changes on Node's side to get 1.1.1 in? And if so, do we need to get additional help for that? Is there a risk that we're going to need to have breaking changes just to get 1.1.1 in properly?
Point of note: On Ubuntu, we are building OpenSSL 1.1.1 with OPENSSL_TLS_SECURITY_LEVEL set to 0 and all TLS protocol versions enabled. It would be nice for node.js 11.0.0 to use 1.1.1 abi/api, and to explicitly limit max TLS version to 1.2 then. This will thus allow us to ship nodejs, and later when things are fixed for 1.3, enable 1.3 without changing base api/abi requirement of the openssl library.
@rvagg @shigeki At the request of @mhdawson I started to look at https://github.com/nodejs/node/issues/18770#issuecomment-420498724
There is one more TODO: rebase on master. When rebasing to master the addition of min_/max_version
conflicts with https://github.com/nodejs/node/pull/23820 I'm working on this in https://github.com/sam-github/node/tree/WIP_upgrade_openssl111_tls12_only, it builds, but I'm reviewing to make sure its correct.
Very nice @sam-github, I'll take a look as soon as I can. A bit busy this week but this is exciting stuff!
How did you go with solving the test-tls-server-verify / SSL_CB_HANDSHAKE_START/DONE
thing? I was under the impression that this was a fairly fundamental problem.
@sam-github Thanks for your work. I will take a look at it later.
@rvagg https://github.com/sam-github/node/commit/a1c64c7d50170e1a49dbcff149a11dbd41166e6b is the tentative fix for it but it does not solve the issues of TLS1.3 adoption.
Please don‘t forget to make the new curves accessible via crypto api.
@sam-github I found that this has the AVX-512 errors in building on MacOS as follows. I will fix it so as to disable AVX-512 support on MacOS.
../deps/openssl/config/archs/darwin64-x86_64-cc/asm/crypto/chacha/chacha-x86_64.s:2203:2: error: instruction requires: AVX-512 ISA
vbroadcasti32x4 L$sigma(%rip),%zmm0
^
../deps/openssl/config/archs/darwin64-x86_64-cc/asm/crypto/chacha/chacha-x86_64.s:2204:2: error: instruction requires: AVX-512 ISA
vbroadcasti32x4 (%rcx),%zmm1
^
../deps/openssl/config/archs/darwin64-x86_64-cc/asm/crypto/chacha/chacha-x86_64.s:2205:2: error: instruction requires: AVX-512 ISA
vbroadcasti32x4 16(%rcx),%zmm2
@shigeki I worry that our branches will diverge. Can I push https://github.com/sam-github/node/tree/WIP_upgrade_openssl111_tls12_only to the nodejs/node repo, so we can both push fixups to it, and perhaps PR it so comment is more possible? If you would prefer I (EDIT: can) PR your branch, or anything else, I am happy to do as you suggest.
As for tls: add min/max_version and their defaults
, I reworked it slightly to include API docs and not modify it's options
argument.
Your change modifies the arguments of tls.SecureContext()
which is exported but undocumented. Do we consider this to be semver-major
? If so, the min and max arguments could be moved to a field in the pre-existing options
argument.
@shigeki re:
Onetest-tls-server-verify test was failed due to client auth error during renegotiation that need to be investigated.
I get 100% test pass on linux. Is the failure ephemeral, or was this the failure fixed by tls: workaround handshakedone in renegotiation
?
EDIT: I am sorry, I ignore this. Its the same question Rod asked and you answered.
@volschin Is there anything special to do? How do I check that what curves are available? I tried this:
core/node (WIP_upgrade_openssl111_tls12_only % u+756-7) % ./node -p 'crypto.getCurves()' > _head
core/node (WIP_upgrade_openssl111_tls12_only % u+756-7) % nvm run 11 -p 'crypto.getCurves()' > _11
core/node (WIP_upgrade_openssl111_tls12_only % u+756-7) % diff -u _11 _head
--- _11 2018-10-31 14:14:59.874786762 -0700
+++ _head 2018-10-31 14:13:57.405782356 -0700
@@ -1,5 +1,6 @@
[ 'Oakley-EC2N-3',
'Oakley-EC2N-4',
+ 'SM2',
'brainpoolP160r1',
'brainpoolP160t1',
'brainpoolP192r1',
@sam-github I‘m not sure, whats working automatically. getcurves should show up with X25519 and X448. Both curves should be usable inside ECDH class e.g. the Alice and Bob example from the docs. The complete test changes for X448 in openssl you can see here https://github.com/rhuijben/openssl/commit/fe93b010e78ab60bc222cf4cbbf3cdfcbecffd91
Note the ECDH class has some features (compressed vs uncompressed coordinates) that do not make sense for X25519 and X448. OpenSSL does not route the new-style curves (which have built-in serializations are specified in terms of byte strings at the public interface) through the old EC API for this reason. Most existing EC APIs provide a slightly incompatible interface, so it usually makes sense to think of these as separate algorithms.
@sam-github
I worry that our branches will diverge. Can I push https://github.com/sam-github/node/tree/WIP_upgrade_openssl111_tls12_only to the nodejs/node repo, so we can both push fixups to it, and perhaps PR it so comment is more possible?
Okay, I do not have a preference. You can move it to nodejs/node. The only thing I'm fearing is for me to push my working branch to origin master by mistake. I would like to work it on my repo until all CI are green. Then I can push all my fixes to your branch.
As for tls: add min/max_version and their defaults, I reworked it slightly to include API docs and not modify it's options argument.
I've got it. Thanks for taking care of it.
Your change modifies the arguments of tls.SecureContext() which is exported but undocumented. Do we consider this to be semver-major?
I missed to know it is exported. It needs to be fixed not to be semver-major.
After fixing for MacOS, the current CI results of https://ci.nodejs.org/job/node-test-commit/22806/ show that it still needs to have more fixes in other platforms such as SmartOS, Win and others. I'm now working on fix for Windows.
@shigeki Is it possible linuxone/s390 didn't have assembly files generated for it?
Since you prefer to work in your own branch and integrate later, I will wait before pushing to nodejs/node or PRing.
Wrt. the semver-major, I'll move the min/max into the options object, push to my branch, and ping you here for your review.
@shigeki PTAL https://github.com/sam-github/node/commit/781d578dd08174c9f5b4e7c2955900a5227ac907, I moved new args to the end of the arg list, it should be backwards compatible like that.
@sam-github
Is it possible linuxone/s390 didn't have assembly files generated for it?
This can be fixed by https://github.com/shigeki/node/commit/940ed53a10fec029b1f9ee3cfe7fcccaf1fbe292.
The MacOS and WIndows build failures can be resolved by https://github.com/shigeki/node/commit/1d03dd128ea1662c8997edf86e32d73f115b7179. CI of https://ci.nodejs.org/job/node-test-commit/22834/ still shows issues on pLinux and SmartOS. I will ask for accessing the hosts.
sam-github@781d578 is fine. Thanks for your fixing.
The issue of SmartOS was identified and can be resolved with https://github.com/shigeki/node/commit/b29f1fba58f3361a8eb579e67097ecd681c020b4.
The issue of plinux (linux-ppc64le) is caused by the seg fault in SHA256_Final()
. It was crashed only in Release not in Debug or with openssl_no_asm. Generated asm files and build flags/options are the same between Node and OpenSSL-1.1.1 so it seems to be more difficult to be resolved. I have to make further investigations but it needs some time.
The issue of plinux (linux-ppc64le) is caused by the seg fault in SHA256_Final(). It was crashed only in Release not in Debug or with openssl_no_asm.
@shigeki: Can you share the link to the plinux failure output? I tried following CI build (ci.nodejs.org/job/node-test-commit/22834 ), but appears the PPC job output got purged already?
@joransiu The output of https://ci.nodejs.org/job/node-test-commit-plinux/21338/nodes=ppcle-ubuntu1404/consoleFull is still existing.
I've found that a certain build option of Node.js causes the issue with a OpenSSL's commit of performance improvement for Power9. It can be resolved soon. Thanks.
I cherry picked https://github.com/shigeki/node/tree/WIP_upgrade_openssl111_tls12_only onto master, the result is https://github.com/sam-github/node/tree/update_openssl1.1.1a
ci: https://ci.nodejs.org/job/node-test-commit/23547/
arm, windows, and AIX builds are ongoing, but everything else other than a link against ossl 1.1.0 (see below) seems to be passed. EDIT: everything else passed
@shigeki:
@shigeki @rvagg wrt. above, I should have been more clear: I used openssl 1.1.1a, not openssl 1.1.1.
Re point 3, yes it is unfortunately. We have to make sure that it still works when dynamically linked against OpenSSL 1.1.0 (well Node 10 and 11 at least, we could cut master loose, that will need an addition to https://github.com/nodejs/build/blob/master/jenkins/scripts/VersionSelectorScript.groovy#L60 to exclude it for gte(12)
, but that can come later).
I don't know if there are any, it might not be possible to know, but consider a Linux distribution that is shipping 1.1.0 and node, dynamically linked. We upgrade Node 10 to 1.1.1 but they are still shipping 1.1.0. It still needs to work for them.
So sequential/test-tls-connect & parallel/test-tls-handshake-error are failing with no cipher match
. is that due to the setup of those specific tests or something more fundamental? We can jigger the tests to check for OpenSSL version and act accordingly if it's not pointing to a larger compatibility problem.
@sam-github @rvagg
Ubuntu is in progress of upgrading openssl from 1.1.0 to 1.1.1 in bionic - 18.04 LTS, see https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/1797386
xenial 16.04 uses openssl 1.0.2 and will not move.
@sam-github Yes, since the plinux issue was resolved in 1.1.1a but I found the asm issue on armv8 is still existing even in 1.1.1.
@rvagg @MylesBorins @mhdawson I didn't think we had any intention of upgrading openssl in stable release lines. My thought was openssl 1.1.1 would be semver-major, or at least limited to >=11.x. You are expecting it to be backported to stable?
@sam-github The current OpenSSL-1.1.0 in Node-v10 is to be EOL in Sep. 11, 2019. So we need to upgrade it even in the stable release line. I think we can have backward compatibilities with limiting use up to TLSv1.2. Some time after upgrading 1.1.1 in Node-v11, it is better to backport it to Node-v10.
@sam-github In my opinion it was a mistake to make 10.x to LTS before solving the OpenSSL support problem. The problems with EOL are written and known for long. Already 8.x LTS is announced for EOL 3 months early, because openssl 1.0.2 is reaching EOL by end of 2019. Integrating openssl 1.1.1 in 10.x is the only way to have a real LTS version.
@shigeki
I found the asm issue on armv8 is still existing even in 1.1.1.
That was true with 1.1.0 as well as 1.1.1, and https://github.com/nodejs/node/pull/24270 works around it. I don't consider that blocking for update to 1.1.1 since it didn't block release of current node with 1.1.0. Not that its not important. Cf. https://github.com/nodejs/node/issues/23913#issuecomment-441143748 <--- its actively being worked on.
I'm building now to see if I can repro on linux when my 1.1.1a branch is configured to compile and link against 1.1.0. I think that's what the failing config does, but the jenkins configs are not so clear to me.
The problems with sequential/test-tls-connect & parallel/test-tls-handshake-error were noticed by @shigeki, see https://github.com/shigeki/node/commit/2b7f91830350203284b8aeaf65761553bb172a84
I can easily rework the tests to work with either openssl 1.1.1 or 1.1.0. Its not so easy to make the behaviour of node consistent across both openssl versions, it appears in 1.1.1 its not noticed that there are no ciphers until the handshake is being serialized, whereas it used to happen when CTX was set up with the user-specified ciphers string, so an error could be thrown synchronously.
I've asked on the openssl mailing list whether this change was intentional. They may have some useful comments.
I found the problem, and a workaround. I'll post it later.
In my opinion it was a mistake to make 10.x to LTS before solving the OpenSSL support problem
That's not really fair @volschin. We tried hard to do this prior to LTS but 1.1.1 was delayed and now there are problems trying to shoehorn it in as you can see here. It's much more difficult than we hoped but we're doing our best. You're welcome to do some of the heavy lifting on these kinds of things if you're not satisfied with the work of mainly volunteers.
@sam-github it's a bit out of date now but I wrote about the strategy we have to take in https://github.com/nodejs/TSC/pull/479, we don't have much choice but get this in to 10. So the most important thing when we get tests passing here is testing to make sure that we have API and ABI stability, which is what OpenSSL has been promising us so 🤞. We have been trying to signal this for as long as we've known it so hopefully it (a) doesn't come as too much of a surprise when 10 has a bump and (b) it doesn't actually impact users, and maybe even they'll get new goodies as a bonus.
@rvagg It wasn't my intention to criticise the hard work of volunteers here to solve the problem. I appreciate this. And this is independently from my opinion how such a problem should be communicated in a good practice of release control. Having 10.13 released as LTS without a notice about the openssl issue that could impact the EOL in the release notes that is my critic.
@shigeki wrt. to failures when openssl 1.1.1 branch is compiled against openssl 1.1.0, I think the workaround is:
I'll run another full CI on https://github.com/sam-github/node/commits/update_openssl1.1.1a once ci is available again.
https://github.com/sam-github/node/tree/update_openssl1.1.1a now passes locally on linux when compiled against deps (openssl-1.1.1a), and passes locally when compiled and linked against a local openssl-1.1.0j build. I'm hopeful that the few failures from last complete ci involving the 1.1.0 scenario are now solved.
FYI I got 1.1.1a (and other recent versions) into the linux-containered/sharedlibs hosts in CI yesterday, so a green there is a green for 1.1.1a.
Looks like ci all passed except:
The linux-containerized tests passed.
I don't think these are related to openssl. I'll rebuild to see if they are ephemeral.
Last remaining issues for openssl 1.1.1a are (to my knowledge):
I don't think either of those two things are blockers. My two concerns are:
Regarding "Error: instruction requires: AVX-512 ISA", see LLVM Issue 39875, Error: instruction requires: AVX-512 ISA when using -mavx512f. The short of it is, Clang 7.0 is needed to sidestep a problem with the Clang integrated assembler. The root cause is LLVM Issue 36202, llvm-mc doesn't accept AVX512 instructions using %zmm16+.
@nodejs/crypto
OpenSSL 1.1.1-pre1 was released today. The headline item is TLS 1.3 (worth noting that the spec hasn't quite been finalised yet). This is obviously only a pre-release, not final and not supposed to be entirely bug free.
The OpenSSL team have said previously that 1.1.1 would be API and ABI compatible with 1.1.0. We currently have 1.1.0 support in Node so the theory goes that it shouldn't be too difficult an upgrade path. This is nice because it's possible (but not yet known) that 1.1.1 is the next LTS of OpenSSL, with 1.1.0 going EOL soon. 1.1.0 -> 1.1.1 or just straight to 1.1.1 might have to be our Node 10 strategy (I'm outlining that case here).
So, getting as close to 1.1.1 support as possible even while it's pre-release would be very valuable for us. Maintaining 1.0.2 and 1.1.0 support in the meantime is preferable (perhaps essential thanks to distribution dependencies). There will be a time, after 1.0.2 EOL next year, that we can ditch all the cruft but for now if we can do all 3 then that's what we should do.
Our CI tests 1.0.2 (obviously) and 1.0.2 dynamically linked. It also tests dynamic linking to 1.1.0 in Node 9+ (soon 8 too I think). See https://ci.nodejs.org/job/node-test-commit-linux-containered/ for this happening.
I've also 1.1.1-pre1 to the same containers that are used to run these other dynamic-linked tests and I can turn that on as needed. For now it's too broken to turn on full-time, so this is the call to help fix that!
Node compiles just fine with 1.1.1-pre1 thanks to @davidben's most excellent work in #16130. But it currently fails 55 tests in CI (there may be at least one async-wrap flaky in there).
We need help figuring out whether these are things that we can fix on our end or whether they are upstream problems. If OpenSSL 1.1.1 isn't properly API compatible with 1.1.0 then I'd like us to push back on them to get them to stick to that commitment.
Full output is captured here https://gist.github.com/rvagg/cdead09ffa269453d728dcf9bc831d3d (it comes from here but that link is not going to survive).