nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.65k stars 29.62k forks source link

Performance regression in v6 and v8 compared to v4 #16164

Closed mirekdlugosz closed 4 years ago

mirekdlugosz commented 7 years ago

Reproducer script

var tough = require('tough-cookie');

// utility function for generating long strings
function repeat(str, num) {
        var result = "";
        for (var i = 0; i < num; i++) {
                result = result + str;
        }
        return result;
}

// poor man's %d.%09d formatter
function format_hrtime(hrtime) {
        var nsec = hrtime[1].toString();
        if (nsec.length < 9) {
                nsec = repeat('0', 9 - nsec.length) + nsec;                                                                                                                                    
        }                                                                                                                                                                                      
        return hrtime[0].toString() + "." + nsec + "s";                                                                                                                                        

}                                                                                                                                                                                              

// run test for cookie value with specified number of semicolons                                                                                                                               
function runtest(num) {                                                                                                                                                                        
        var Cookie = tough.Cookie;                                                                                                                                                             

        var header = 'foo=bar' + repeat(';', num) + ' domain=example.com';                                                                                                                     
        console.log("Running test with %d semicolons, total header value length: %d", num, header.length);                                                                                     

        var hrstart = process.hrtime();                                                                                                                                                        
        Cookie.parse(header);                                                                                                                                                                  
        var hrend = process.hrtime(hrstart);                                                                                                                                                   

        console.log("Cookie parsed in: %s", format_hrtime(hrend));                                                                                                                             
        console.log("--");                                                                                                                                                                     
}                                                                                                                                                                                              

runtest(1);
runtest(8*1024);
runtest(16*1024);
runtest(64*1024);
runtest(80*1024);

Steps to reproduce

  1. Save reproducer script as test.js
  2. Run npm install tough-cookie
  3. Run node test.js on v4, v6 and v8.

Expected results

Times for all tests are roughly the same OR descending in newer versions

Actual results

# v4
Running test with 1 semicolons, total header value length: 27
Cookie parsed in: 0.000483452s
--
Running test with 8192 semicolons, total header value length: 8218
Cookie parsed in: 0.003077938s
--
Running test with 16384 semicolons, total header value length: 16410
Cookie parsed in: 0.002310960s
--
Running test with 65536 semicolons, total header value length: 65562
Cookie parsed in: 0.008484061s
--
Running test with 81920 semicolons, total header value length: 81946
Cookie parsed in: 0.010608327s
--
# v6
Running test with 1 semicolons, total header value length: 27
Cookie parsed in: 0.000610655s
--
Running test with 8192 semicolons, total header value length: 8218
Cookie parsed in: 0.003484339s
--
Running test with 16384 semicolons, total header value length: 16410
Cookie parsed in: 0.002307835s
--
Running test with 65536 semicolons, total header value length: 65562
Cookie parsed in: 0.008577639s
--
Running test with 81920 semicolons, total header value length: 81946
Cookie parsed in: 3.957750148s
--
# v8
Running test with 1 semicolons, total header value length: 27
Cookie parsed in: 0.000368661s
--
Running test with 8192 semicolons, total header value length: 8218
Cookie parsed in: 0.001572523s
--
Running test with 16384 semicolons, total header value length: 16410
Cookie parsed in: 0.003161710s
--
Running test with 65536 semicolons, total header value length: 65562
Cookie parsed in: 1.896479728s
--
Running test with 81920 semicolons, total header value length: 81946
Cookie parsed in: 2.947534166s

The last case (81920 semicolons) is completely blown up in v6 - it runs 400 times slower. The last case runs "only" 200 times slower in v8 (compared to v4), but v8 runs second-to-last case (65536 semicolons) hundred times slower than either v6 or v4.

Please note that this is not related to tough-cookie module - it's code stays the same across all versions of node used during test. Performance regression is somewhere in node code that tough-cookie uses.

That script was used as test for security issue related to regular expression parsing in tough-cookie, so I guess RE engine is good place to start investigation.

joyeecheung commented 7 years ago

Looks like a V8 perf regression. cc @nodejs/v8

See profile generated by 8.7.0 ``` Statistical profiling result from isolate-0x102801c00-v8.log, (6281 ticks, 18 unaccounted, 0 excluded). [Shared libraries]: ticks total nonlib name 2 0.0% /usr/lib/system/libsystem_pthread.dylib 1 0.0% /usr/lib/system/libsystem_platform.dylib [JavaScript]: ticks total nonlib name 8 0.1% 0.1% Function: ~parse /Users/joyee/projects/node-test/16164/node_modules/tough-cookie/lib/cookie.js:325:15 2 0.0% 0.0% Stub: StringAddStub 2 0.0% 0.0% LazyCompile: *parse /Users/joyee/projects/node-test/16164/node_modules/tough-cookie/lib/cookie.js:325:15 2 0.0% 0.0% Builtin: CallFunction_ReceiverIsAny 1 0.0% 0.0% Stub: CEntryStub {1} 1 0.0% 0.0% Stub: CEntryStub 1 0.0% 0.0% Function: ~getPathFromURL internal/url.js:1371:24 1 0.0% 0.0% Builtin: LoadICProtoArray [C++]: ticks total nonlib name 4784 76.2% 76.2% T v8::internal::Heap::MoveElements(v8::internal::FixedArray*, int, int, int) 1352 21.5% 21.5% T __simple_dprintf 35 0.6% 0.6% t node::(anonymous namespace)::ContextifyScript::New(v8::FunctionCallbackInfo const&) 7 0.1% 0.1% t v8::internal::Builtin_Impl_ArrayShift(v8::internal::BuiltinArguments, v8::internal::Isolate*) 5 0.1% 0.1% t v8::internal::(anonymous namespace)::ElementsAccessorBase >::SetLengthImpl(v8::internal::Isolate*, v8::internal::Handle, unsigned int, v8::internal::Handle) 4 0.1% 0.1% T v8::internal::Heap::CanMoveObjectStart(v8::internal::HeapObject*) 3 0.0% 0.0% t v8::internal::Builtin_Impl_StringPrototypeTrim(v8::internal::BuiltinArguments, v8::internal::Isolate*) 3 0.0% 0.0% t v8::internal::(anonymous namespace)::FastElementsAccessor >::MoveElements(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle, int, int, int, int, int) 3 0.0% 0.0% T v8::internal::Heap::AllocateFixedArrayWithFiller(int, v8::internal::PretenureFlag, v8::internal::Object*) 3 0.0% 0.0% T _mprotect 2 0.0% 0.0% t void v8::internal::String::WriteToFlat(v8::internal::String*, unsigned char*, int, int) 2 0.0% 0.0% t node::cares_wrap::(anonymous namespace)::ChannelWrap::New(v8::FunctionCallbackInfo const&) 2 0.0% 0.0% t node::Binding(v8::FunctionCallbackInfo const&) 2 0.0% 0.0% T node::TTYWrap::New(v8::FunctionCallbackInfo const&) 1 0.0% 0.0% t void v8::internal::Heap::UpdateAllocationSite<(v8::internal::Heap::UpdateAllocationSiteMode)1>(v8::internal::Map*, v8::internal::HeapObject*, v8::base::PointerTemplateHashMapImpl*) 1 0.0% 0.0% t v8::internal::Scavenger::ScavengeObject(v8::internal::HeapObject**, v8::internal::HeapObject*) 1 0.0% 0.0% t v8::internal::ParserBase::ParsePrimaryExpression(bool*, bool*) 1 0.0% 0.0% t v8::internal::ParserBase::ParseExpressionCoverGrammar(bool, bool*) 1 0.0% 0.0% t v8::internal::ParserBase::Expect(v8::internal::Token::Value, bool*) 1 0.0% 0.0% t v8::internal::LocalAllocator::AllocateInLAB(int, v8::internal::AllocationAlignment) 1 0.0% 0.0% t v8::internal::DescriptorArray::Append(v8::internal::Descriptor*) 1 0.0% 0.0% t v8::internal::(anonymous namespace)::FastElementsAccessor >::RemoveElement(v8::internal::Handle, v8::internal::(anonymous namespace)::Where) 1 0.0% 0.0% t node::Read(v8::FunctionCallbackInfo const&) 1 0.0% 0.0% t _szone_malloc_should_clear 1 0.0% 0.0% t __enlarge 1 0.0% 0.0% t ___vfprintf 1 0.0% 0.0% T v8::internal::interpreter::BytecodeArrayWriter::EmitBytecode(v8::internal::interpreter::BytecodeNode const*) 1 0.0% 0.0% T v8::internal::compiler::JSGraph::Int32Constant(int) 1 0.0% 0.0% T v8::internal::Variable::IsGlobalObjectProperty() const 1 0.0% 0.0% T v8::internal::String::Trim(v8::internal::Handle, v8::internal::String::TrimMode) 1 0.0% 0.0% T v8::internal::Scavenger::Process() 1 0.0% 0.0% T v8::internal::Scanner::Scan() 1 0.0% 0.0% T v8::internal::Runtime_StringSplit(int, v8::internal::Object**, v8::internal::Isolate*) 1 0.0% 0.0% T v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle, v8::internal::Handle, v8::internal::Handle, v8::internal::LanguageMode) 1 0.0% 0.0% T v8::internal::LoadIC::Load(v8::internal::Handle, v8::internal::Handle) 1 0.0% 0.0% T v8::internal::InnerPointerToCodeCache::GetCacheEntry(unsigned char*) 1 0.0% 0.0% T v8::internal::IC::ConfigureVectorState(v8::internal::Handle, v8::internal::Handle, v8::internal::Handle) 1 0.0% 0.0% T v8::internal::Heap::LeftTrimFixedArray(v8::internal::FixedArrayBase*, int) 1 0.0% 0.0% T v8::internal::Heap::Contains(v8::internal::HeapObject*) 1 0.0% 0.0% T v8::internal::HashTable::Rehash() 1 0.0% 0.0% T v8::internal::FindOneByteStringIndices(v8::internal::Vector, unsigned char, v8::internal::List*, unsigned int) 1 0.0% 0.0% T v8::internal::Factory::NewProperSubString(v8::internal::Handle, int, int) 1 0.0% 0.0% T v8::internal::Dictionary::SetEntry(int, v8::internal::Object*, v8::internal::Object*, v8::internal::PropertyDetails) 1 0.0% 0.0% T v8::internal::Deserializer::ReadData(v8::internal::Object**, v8::internal::Object**, int, unsigned char*) 1 0.0% 0.0% T v8::internal::Builtin_StringPrototypeTrim(int, v8::internal::Object**, v8::internal::Isolate*) 1 0.0% 0.0% T v8::internal::Assembler::jmp(v8::internal::Label*, v8::internal::Label::Distance) 1 0.0% 0.0% T _setvbuf 1 0.0% 0.0% T _mach_msg 1 0.0% 0.0% T ___carbon_delete [Summary]: ticks total nonlib name 18 0.3% 0.3% JavaScript 6242 99.4% 99.4% C++ 6 0.1% 0.1% GC 3 0.0% Shared libraries 18 0.3% Unaccounted [C++ entry points]: ticks cpp total name 6156 98.8% 98.0% T v8::internal::Builtin_ArrayShift(int, v8::internal::Object**, v8::internal::Isolate*) 42 0.7% 0.7% T v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) 10 0.2% 0.2% T v8::internal::Runtime_CompileLazy(int, v8::internal::Object**, v8::internal::Isolate*) 7 0.1% 0.1% T v8::internal::Builtin_StringPrototypeTrim(int, v8::internal::Object**, v8::internal::Isolate*) 4 0.1% 0.1% T v8::internal::Runtime_StringSplit(int, v8::internal::Object**, v8::internal::Isolate*) 3 0.0% 0.0% T v8::internal::Runtime_LoadIC_Miss(int, v8::internal::Object**, v8::internal::Isolate*) 2 0.0% 0.0% T v8::internal::Runtime_NewObject(int, v8::internal::Object**, v8::internal::Isolate*) 2 0.0% 0.0% T v8::internal::Runtime_CreateObjectLiteral(int, v8::internal::Object**, v8::internal::Isolate*) 2 0.0% 0.0% T v8::internal::Runtime_CompileForOnStackReplacement(int, v8::internal::Object**, v8::internal::Isolate*) 1 0.0% 0.0% T v8::internal::Runtime_NumberToString(int, v8::internal::Object**, v8::internal::Isolate*) [Bottom up (heavy) profile]: Note: percentage shows a share of a particular caller in the total amount of its parent calls. Callers occupying less than 1.0% are not shown. ticks parent name 4784 76.2% T v8::internal::Heap::MoveElements(v8::internal::FixedArray*, int, int, int) 4784 100.0% T v8::internal::Builtin_ArrayShift(int, v8::internal::Object**, v8::internal::Isolate*) 3352 70.1% LazyCompile: *parse /Users/joyee/projects/node-test/16164/node_modules/tough-cookie/lib/cookie.js:325:15 3352 100.0% Function: ~runtest /Users/joyee/projects/node-test/16164/test.js:26:17 3352 100.0% Function: ~ /Users/joyee/projects/node-test/16164/test.js:1:11 3352 100.0% Function: ~Module._compile module.js:581:37 1432 29.9% Function: ~parse /Users/joyee/projects/node-test/16164/node_modules/tough-cookie/lib/cookie.js:325:15 1432 100.0% Function: ~runtest /Users/joyee/projects/node-test/16164/test.js:26:17 1432 100.0% Function: ~ /Users/joyee/projects/node-test/16164/test.js:1:11 1432 100.0% Function: ~Module._compile module.js:581:37 1352 21.5% T __simple_dprintf 1350 99.9% T v8::internal::Builtin_ArrayShift(int, v8::internal::Object**, v8::internal::Isolate*) 949 70.3% LazyCompile: *parse /Users/joyee/projects/node-test/16164/node_modules/tough-cookie/lib/cookie.js:325:15 949 100.0% Function: ~runtest /Users/joyee/projects/node-test/16164/test.js:26:17 949 100.0% Function: ~ /Users/joyee/projects/node-test/16164/test.js:1:11 949 100.0% Function: ~Module._compile module.js:581:37 401 29.7% Function: ~parse /Users/joyee/projects/node-test/16164/node_modules/tough-cookie/lib/cookie.js:325:15 401 100.0% Function: ~runtest /Users/joyee/projects/node-test/16164/test.js:26:17 401 100.0% Function: ~ /Users/joyee/projects/node-test/16164/test.js:1:11 401 100.0% Function: ~Module._compile module.js:581:37 ```
bmeurer commented 7 years ago

This might be related to right trimming? cc @mlippautz

mlippautz commented 7 years ago

Yes, it's right trimming related.

(Right trimming infrastructure did change though only recently for concurrent marking and even then it just manual copy vs memcopy.)

apapirovski commented 6 years ago

@nodejs/v8 Anything that can be done here or should we just close it out? Is there a tracking issue on the v8 side perhaps?

targos commented 6 years ago

This is still an issue in Node 10. I think we should make sure an issue is opened on the V8 side, with a standalone test case. Somebody wants to do that?

Trott commented 6 years ago

Still an issue in Node.js 11.1.0.

hashseed commented 6 years ago

@mlippautz do we have any plans to address right trimming in the near future?

mlippautz commented 6 years ago

1) It's unclear whether the fast version was exercising a right trimming path? Can we figure that out? 2) If it is right trimming related then the regression has to be related to GC running during benchmarking because that's where the codepath changed. There's nothing we can do here as we have to perform atomic reads and writes.

BridgeAR commented 5 years ago

@mlippautz @bmeurer @hashseed shall I open an issue on your bug tracker? I guess it's easier for you to further look into the differences here?

jasnell commented 4 years ago

Given that none of the versions in question are supported any longer, closing.