nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.54k stars 29.58k forks source link

Nodejs runs out of memory trying to connect to HTTPS host with self-signed certificate #37757

Closed nils91 closed 3 years ago

nils91 commented 3 years ago

What steps will reproduce the bug?

When trying to connect to a https-enabled host and using a self-signed certificate provided to node via NODE_EXTRA_CA_CERTS node hangs for a few minutes, using more and more memory, until it exits with Allocation failed - JavaScript heap out of memory. Full stacktrace below.

How often does it reproduce? Is there a required condition?

The server needs to be up and the correct server certificate needs to be used.

What is the expected behavior?

I would expect the connection to be made successfully, since the self-signed certificate has been provided to node.

What do you see instead?

<--- Last few GCs --->

[15068:0x52f3bd0]   324861 ms: Scavenge (reduce) 4053.6 (4128.3) -> 4053.4 (4129.3) MB, 72.6 / 0.0 ms  (average mu = 0.851, current mu = 0.769) allocation failure
[15068:0x52f3bd0]   325180 ms: Scavenge (reduce) 4054.9 (4129.3) -> 4054.8 (4131.1) MB, 86.6 / 0.0 ms  (average mu = 0.851, current mu = 0.769) allocation failure
[15068:0x52f3bd0]   325442 ms: Scavenge (reduce) 4056.6 (4131.1) -> 4056.4 (4132.6) MB, 71.5 / 0.0 ms  (average mu = 0.851, current mu = 0.769) allocation failure

<--- JS stacktrace --->

FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory
 1: 0xa7f490 node::Abort() [node]
 2: 0x9a5c4d node::FatalError(char const*, char const*) [node]
 3: 0xc6c2ae v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xc6c627 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xe360a5  [node]
 6: 0xe65633 v8::internal::EvacuateNewSpaceVisitor::Visit(v8::internal::HeapObject, int) [node]
 7: 0xe721e6 v8::internal::FullEvacuator::RawEvacuatePage(v8::internal::MemoryChunk*, long*) [node]
 8: 0xe5e30f v8::internal::Evacuator::EvacuatePage(v8::internal::MemoryChunk*) [node]
 9: 0xe5e588 v8::internal::PageEvacuationTask::RunInParallel(v8::internal::ItemParallelJob::Task::Runner) [node]
10: 0xe503a9 v8::internal::ItemParallelJob::Run() [node]
11: 0xe74170 void v8::internal::MarkCompactCollectorBase::CreateAndExecuteEvacuationTasks<v8::internal::FullEvacuator, v8::internal::MarkCompactCollector>(v8::internal::MarkCompactCollector*, v8::internal::ItemParallelJob*, v8::internal::MigrationObserver*, long) [node]
12: 0xe749b3 v8::internal::MarkCompactCollector::EvacuatePagesInParallel() [node]
13: 0xe74d75 v8::internal::MarkCompactCollector::Evacuate() [node]
14: 0xe874e1 v8::internal::MarkCompactCollector::CollectGarbage() [node]
15: 0xe433a8 v8::internal::Heap::MarkCompact() [node]
16: 0xe44d38 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
17: 0xe482dc v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
18: 0xe0bf3a v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
19: 0xe05444 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
20: 0xe07540 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [node]
21: 0xe1955e v8::internal::Factory::NewStringFromOneByte(v8::internal::Vector<unsigned char const> const&, v8::internal::AllocationType) [node]
22: 0xc839e2 v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, v8::NewStringType, int) [node]
23: 0xbb5914 node::crypto::GetFingerprintDigest(node::Environment*, evp_md_st const*, x509_st*) [node]
24: 0xbb736e node::crypto::X509ToObject(node::Environment*, x509_st*) [node]
25: 0xbb81e4 node::crypto::GetPeerCert(node::Environment*, std::unique_ptr<ssl_st, node::FunctionDeleter<ssl_st, &SSL_free> > const&, bool, bool) [node]
26: 0xc2b567 node::crypto::TLSWrap::GetPeerCertificate(v8::FunctionCallbackInfo<v8::Value> const&) [node]
27: 0xcd8cbb  [node]
28: 0xcda26c  [node]
29: 0xcda8e6 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
30: 0x14fa219  [node]
./run.sh: line 5: 15068 Aborted                 node index.js

Additional information

All files i´ve used to reproduce: node_https_test.zip https_renew_cert.sh is a script used to generate the key/certificate pair. (Sample certificate and key is also included in the zip, so you don´t have to regenerate them, if you don´t have oppenssl installed) server.js is the server. Execute with node server.js. index.js is the client. Started from run.sh run.sh: Starts the client and sets NODE_EXTRA_CA_CERTS. (You need to update the path to the certificate)

RaisinTen commented 3 years ago
❯ node -v
v15.2.0
❯ cp ~/Downloads/node_https_test.zip .
❯ unzip node_https_test.zip
Archive:  node_https_test.zip
  inflating: cert.conf               
  inflating: cert.pem                
  inflating: https_renew_cert.sh     
  inflating: index.js                
  inflating: key.pem                 
  inflating: run.sh                  
  inflating: server.js               
❯ export NODE_EXTRA_CA_CERTS="$PWD/cert.pem"
❯ node server.js
^Z
[1]  + 9905 suspended  node server.js
❯ node index.js
^Z
[2]  + 9921 suspended  node index.js
❯ fg %node\ server.js
[1]  - 9905 continued  node server.js
^Z
[1]  + 9905 suspended  node server.js
❯ fg %node\ index.js
[2]  - 9921 continued  node index.js
An error Error: unable to verify the first certificate
    at TLSSocket.onConnectSecure (node:_tls_wrap:1498:34)
    at TLSSocket.emit (node:events:329:20)
    at TLSSocket._finishInit (node:_tls_wrap:933:8)
    at TLSWrap.ssl.onhandshakedone (node:_tls_wrap:707:12) {
  code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE'
}

Can't reproduce. Am I doing something wrong here?

nils91 commented 3 years ago

Looks fine to me. Just did exactly what you did, reproduces the problem for me. Are you running ubuntu? I also tried it on fedora after you answered and then i get the same error you do.

RaisinTen commented 3 years ago

Yes, Ubuntu.

nils91 commented 3 years ago

I´ve set up a repo with an action to reproduce the issue: https://github.com/nils91/node_https_test. Action output:

Run actions/setup-node@v2.1.2
Attempting to download 15.x...
Not found in manifest.  Falling back to download directly from Node
Acquiring 15.11.0 from https://nodejs.org/dist/v15.11.0/node-v15.11.0-linux-x64.tar.gz
Extracting ...
/usr/bin/tar xz --strip 1 --warning=no-unknown-keyword -C /home/runner/work/_temp/b0e5d0e1-53c0-465e-a30d-7fc2e5c30f1e -f /home/runner/work/_temp/49fe5a6a-bbc1-4afa-b53c-46330d65ee3c
Adding to the cache ...
Done
0s
Run uname -a
Linux fv-az189-765 5.4.0-1040-azure 42-Ubuntu SMP Fri Feb 5 15:39:06 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
0s
Run node -v
v15.11.0
5s
Run node server.js &
0s
Run chmod +x ./run.sh
2m 11s
Run ./run.sh

<--- Last few GCs --->

[1730:0x5d04ef0]   106385 ms: Mark-sweep (reduce) 2046.2 (2085.8) -> 2046.1 (2086.3) MB, 2473.3 / 0.0 ms  (average mu = 0.080, current mu = 0.011) allocation failure scavenge might not succeed
[1730:0x5d04ef0]   108843 ms: Mark-sweep (reduce) 2047.1 (2086.3) -> 2047.0 (2087.3) MB, 2427.4 / 0.0 ms  (average mu = 0.047, current mu = 0.013) allocation failure scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa7f490 node::Abort() [node]
 2: 0x9a5c4d node::FatalError(char const*, char const*) [node]
 3: 0xc6c2ae v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xc6c627 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xe360a5  [node]
 6: 0xe36c4c  [node]
 7: 0xe445db v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0xe482dc v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xe0c077 v8::internal::Factory::AllocateRawWithAllocationSite(v8::internal::Handle<v8::internal::Map>, v8::internal::AllocationType, v8::internal::Handle<v8::internal::AllocationSite>) [node]
10: 0xe128b8 v8::internal::Factory::NewJSObjectFromMap(v8::internal::Handle<v8::internal::Map>, v8::internal::AllocationType, v8::internal::Handle<v8::internal::AllocationSite>) [node]
11: 0xe13fa3 v8::internal::Factory::NewJSArrayBuffer(std::shared_ptr<v8::internal::BackingStore>, v8::internal::AllocationType) [node]
12: 0xc96b99 v8::ArrayBuffer::New(v8::Isolate*, std::shared_ptr<v8::BackingStore>) [node]
13: 0xbb79c3 node::crypto::X509ToObject(node::Environment*, x509_st*) [node]
14: 0xbb81e4 node::crypto::GetPeerCert(node::Environment*, std::unique_ptr<ssl_st, node::FunctionDeleter<ssl_st, &SSL_free> > const&, bool, bool) [node]
15: 0xc2b567 node::crypto::TLSWrap::GetPeerCertificate(v8::FunctionCallbackInfo<v8::Value> const&) [node]
16: 0xcd8cbb  [node]
17: 0xcda26c  [node]
18: 0xcda8e6 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
19: 0x14fa219  [node]
./run.sh: line 5:  1730 Aborted                 (core dumped) node index.js
Error: Process completed with exit code 134.
nils91 commented 3 years ago

I might have found a workaround: Removing the line

keyUsage = digitalSignature, keyEncipherment

from the cert.conf file and regenerating the certificate using the new config allows node to get a connection. However, that might not be applicable to all use-cases, as some applications can be picky about the keyUsage configuration. Edit: keyUsage, at least right now, requires keyCertSign to be set. See https://github.com/openssl/openssl/issues/1418 for details. For this issue, adding keyCertSign to the keyUsage works as a workaround.

RaisinTen commented 3 years ago

I forked your repo here and ran some tests on few other versions. In spite of running the test on v15.11.0 again, the function called after X509ToObject in the stack trace seems to differ:

<--- Last few GCs --->

[1560:0x6003ef0]    85165 ms: Mark-sweep (reduce) 2043.8 (2083.1) -> 2043.4 (2083.8) MB, 1600.2 / 0.0 ms  (+ 0.0 ms in 69 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1880 ms) (average mu = 0.485, current mu = 0.233) [1560:0x6003ef0]    87646 ms: Mark-sweep (reduce) 2044.4 (2083.8) -> 2044.3 (2084.6) MB, 2454.3 / 0.0 ms  (average mu = 0.255, current mu = 0.011) allocation failure scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory
 1: 0xa7f490 node::Abort() [node]
 2: 0x9a5c4d node::FatalError(char const*, char const*) [node]
 3: 0xc6c2ae v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xc6c627 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xe360a5  [node]
 6: 0xe65633 v8::internal::EvacuateNewSpaceVisitor::Visit(v8::internal::HeapObject, int) [node]
 7: 0xe721e6 v8::internal::FullEvacuator::RawEvacuatePage(v8::internal::MemoryChunk*, long*) [node]
 8: 0xe5e30f v8::internal::Evacuator::EvacuatePage(v8::internal::MemoryChunk*) [node]
 9: 0xe5e588 v8::internal::PageEvacuationTask::RunInParallel(v8::internal::ItemParallelJob::Task::Runner) [node]
10: 0xe503a9 v8::internal::ItemParallelJob::Run() [node]
11: 0xe74170 void v8::internal::MarkCompactCollectorBase::CreateAndExecuteEvacuationTasks<v8::internal::FullEvacuator, v8::internal::MarkCompactCollector>(v8::internal::MarkCompactCollector*, v8::internal::ItemParallelJob*, v8::internal::MigrationObserver*, long) [node]
12: 0xe749b3 v8::internal::MarkCompactCollector::EvacuatePagesInParallel() [node]
13: 0xe74d75 v8::internal::MarkCompactCollector::Evacuate() [node]
14: 0xe874e1 v8::internal::MarkCompactCollector::CollectGarbage() [node]
15: 0xe433a8 v8::internal::Heap::MarkCompact() [node]
16: 0xe44d38 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
17: 0xe482dc v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
18: 0xe0bf3a v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
19: 0xe05444 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
20: 0xe07540 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [node]
21: 0xe0ddfd v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const> const&, v8::internal::AllocationType) [node]
22: 0xc83882 v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int) [node]
23: 0xbb774e node::crypto::X509ToObject(node::Environment*, x509_st*) [node]
24: 0xbb81e4 node::crypto::GetPeerCert(node::Environment*, std::unique_ptr<ssl_st, node::FunctionDeleter<ssl_st, &SSL_free> > const&, bool, bool) [node]
25: 0xc2b567 node::crypto::TLSWrap::GetPeerCertificate(v8::FunctionCallbackInfo<v8::Value> const&) [node]
26: 0xcd8cbb  [node]
27: 0xcda26c  [node]
28: 0xcda8e6 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
29: 0x14fa219  [node]
./run.sh: line 5:  1560 Aborted                 (core dumped) node index.js
Error: Process completed with exit code 134.
RaisinTen commented 3 years ago

https://github.com/RaisinTen/node_https_test/runs/2140664480?check_suite_focus=true says that the crash was introduced in v15.5.0, so I think the relevant change lies somewhere in https://github.com/nodejs/node/pull/36597. Since I don't see any change in src/crypto*, I can't really tell where the relevant change lies. ¯\_(ツ)_/¯

RaisinTen commented 3 years ago

cc @nodejs/crypto

schamberg97 commented 3 years ago

https://github.com/RaisinTen/node_https_test/runs/2140664480?check_suite_focus=true says that the crash was introduced in v15.5.0, so I think the relevant change lies somewhere in #36597. Since I don't see any change in src/crypto*, I can't really tell where the relevant change lies. ¯_(ツ)_/¯

There, however, was an OpenSSL update in 15.5.0. Perhaps this caused the problem?

addaleax commented 3 years ago

https://github.com/nodejs/node/issues/37889 contains some background on the OpenSSL change that might be related here, as well as a standalone repro.

addaleax commented 3 years ago

Looks like this is the relevant infinite loop:

https://github.com/nodejs/node/blob/f00c2435922a33f9d866846d0b5d12691b148348/src/crypto/crypto_common.cc#L469

I'm not really familiar enough with the OpenSSL APIs to understand why, though.

nils91 commented 3 years ago

According to its documentation X509_check_issued checks if 'the keyUsage field (if present) of issuer allows certificate signing'. That method is defined in here and seems to return X509_V_ERR_KEYUSAGE_NO_CERTSIGN if no keyCertSign is present (what #37889 is about). As far as i can tell after looking through the source for openssl and the snippet @addaleax posted above for a bit there is no branch in the code that stops the loop if that error occurs. Actually it seems to be X509_V_ERR_UNSPECIFIED.

fknx commented 3 years ago

I am not sure if this is relevant, but I've verified the behavior change from OpenSSL 1.1.1g to 1.1.1h. With OpenSSL 1.1.1g the connection openssl.exe s_client -connect 127.0.0.1:4001 -CAfile localhost.cer fails with the Verify return code: 21 (unable to verify the first certificate) error, while it succeeds with OpenSSL 1.1.1h (Verify return code: 0 (ok)).

ramosbugs commented 3 years ago

I can confirm this issue shows up on Mac OS 10.15.7 as well, though it just infinite loops for me (haven't let it run long enough to exhaust available memory, but I see it increasing steadily).

To make matters worse, Let's Encrypt's official recommendation (which I think is correct) includes the exact keyUsage extension that triggers this bug: https://letsencrypt.org/docs/certificates-for-localhost/#making-and-trusting-your-own-certificates

thanks for looking into this!

sangaman commented 2 years ago

I might have found a workaround: Removing the line

keyUsage = digitalSignature, keyEncipherment

from the cert.conf file and regenerating the certificate using the new config allows node to get a connection. However, that might not be applicable to all use-cases, as some applications can be picky about the keyUsage configuration. Edit: keyUsage, at least right now, requires keyCertSign to be set. See openssl/openssl#1418 for details. For this issue, adding keyCertSign to the keyUsage works as a workaround.

I was running into this bug and this comment saved me. With a cert that just had digitalSignature and keyEncipherment, I could not use tls and node rapidly gobbled up memory until crashing. If I just add certificate signing to the list of usages, the OOM goes away and tls worked. Thank you!