nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.63k stars 29.61k forks source link

Why use new URL() may lead to memory leak ? #17448

Closed liuqipeng417 closed 6 years ago

liuqipeng417 commented 6 years ago

My project is using express.js, and having a code fragment like this:

conse {URL} = require('url');
const express = require('express');

const router = express.Router();

router.put('', (req, res) => {
  let data = req.body.flow ? JSON.parse(req.body.flow) : {};
  if (data.url && !data.domain) {
    data.domain = new URL(data.url).hostname;
  }

  res.send();
})

When many request come in, the project memory will rise slowly ,and sometimes it will drop a little. but in general it's still up. When i change to url.parse(), the memory looks normal. I use PM2 to look memory.

What's the difference between the two(new URL and url.parse) in memory or gc?

apapirovski commented 6 years ago

I would figure that there's a higher memory requirement given internals of the WHATWG-URL interface but it sounds like you're saying there's an actual memory leak? Could you provide some more precise numbers/stats? Are we talking a steady increase (say, 10% higher, but not constantly rising) or a never-ending increase over the run of the process?

/cc @jasnell & @TimothyGu

TimothyGu commented 6 years ago

The WHATWG URL implementation uses C++, so there's a possibility that the C++ code has some memleaks in them. In fact Valgrind seems to agree: there's an awful lot of things like the following coming from memcheck:

==28049== 649,419 bytes in 20,949 blocks are definitely lost in loss record 701 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E7776: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049== 984,467 bytes in 31,757 blocks are definitely lost in loss record 702 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049== 3,721,922 bytes in 120,062 blocks are definitely lost in loss record 703 of 703
==28049==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==28049==    by 0x5665A46: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==28049==    by 0x10E4CAE: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E16C5: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==28049==    by 0x10E7776: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)

all pointing towards ParseHost() as the culprit.

Test script:

'use strict';

const { URL, parse } = require('url');
// Adjust to taste.
for (let i = 0; i < 100001; i++) {
  new URL('https://domain1.domain.test.com/789012/3456/789012345678-2?section=10');
  if (i % 1000 === 0) {
    gc();
  }
  if (i % 100000 === 0) {
    console.log(process.memoryUsage());
  }
}
TimothyGu commented 6 years ago

Tracked it down to

==29610== 1,546,342 bytes in 49,882 blocks are definitely lost in loss record 85 of 85
==29610==    at 0x4C2C21F: operator new(unsigned long) (vg_replace_malloc.c:334)
==29610==    by 0x5665FD4: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.22)
==29610==    by 0x10EA1B0: node::url::PercentDecode(char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10EB0AF: node::url::ParseHost(node::url::url_host*, char const*, unsigned long, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10E9839: node::url::ParseHost(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10E7D0D: node::url::URL::Parse(char const*, unsigned long, node::url::url_parse_state, node::url::url_data*, bool, node::url::url_data const*, bool) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10F55D2: node::url::Parse(node::Environment*, v8::Local<v8::Value>, char const*, unsigned long, node::url::url_parse_state, v8::Local<v8::Value>, v8::Local<v8::Value>, v8::Local<v8::Function>, v8::Local<v8::Value>) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x10F29C0: node::url::Parse(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/timothy-gu/dev/node/node/out/Release/node)
==29610==    by 0x9F4A8E: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (in /home/timothy-gu/dev/node/node/out/Release/node)

But I don't understand why this is the case, as everything should be allocated on the stack (and therefore freed automatically)...

addaleax commented 6 years ago

@TimothyGu How did you run Node + valgrind to get that output?

liuqipeng417 commented 6 years ago

@apapirovski The phenomenon I observed is that memory continues to rise, and memory is released(a little), but the overall trend is rising. My online project (one process qps is 50 - 100) runs 7 days, the memory is up to 1GB. When using url.parse , the memory is stable in 100MB.

TimothyGu commented 6 years ago

@addaleax

valgrind --leak-check=full --undef-value-errors=no ./node --expose-gc a.js where a.js contains the script in https://github.com/nodejs/node/issues/17448#issuecomment-349108826. Valgrind is the default installation in Debian stable (1:3.12.0~svn20160714-1). node is built normally with Clang 3.9 (release config) except node_url.o, which is manually built with -O0.

joyeecheung commented 6 years ago

@TimothyGu That might be a false positive from memory pooling because it is not run with a debug build of STL, see http://valgrind.org/docs/manual/faq.html#faq.reports

TimothyGu commented 6 years ago

@joyeecheung That FAQ talks about "still reachable" leaks. The leak here is "definitely lost".

trygve-lie commented 6 years ago

We have been struggelig for a while with a very small memory leak without being able to find it. We have almost simmilar code as posted in the initial message here and I'm able to get that part of our code to leak when stresstesting it.

In my testing I noticed that the leak are only happening if the host have more then 3 parts.

In other words, this does leak:

new URL('https://domain1.domain.test.com/789012/3456/789012345678-2?section=10');

This does not leak:

new URL('https://domain.test.com/789012/3456/789012345678-2?section=10');
addaleax commented 6 years ago

This is happening because union url_host_value has std::string members but an empty destructor.

I’m working on this but it’s probably going to be a bit of a refactor to fix this in a safe way