nodejs / citgm

Canary in the Gold Mine
https://www.npmjs.com/package/citgm
Other
565 stars 148 forks source link

should emit more timestamps #419

Closed refack closed 7 years ago

refack commented 7 years ago

Runs on our CI takes 5 hours. (running on info: cores | running tests using 20 cores). I'm trying to figure out why...

I've manually parsed the output and found these two time stamps:

L17578: verbose: watchify npm-test:  | 716 bytes written to /tmp/watchify-0.5052510924507041/bundle.js (0.01 seconds) at 22:08:45
L23003: verbose: socket.io npm-test: | [04:12:15]          

No other timestamp in the middle.

Found these reports in between (but they don't tell the whole story since it's 20 cores)

info: duration            | test duration: 241753ms
info: watchify done       | done - the test suite for watchify version 3.9.0 passed.
info: starting            | bl                  
...
info: duration            | test duration: 247837ms
info: body-parser done    | done - the test suite for body-parser version 1.17.2 passed.
info: starting            | binary-split        
...
info: duration            | test duration: 70665ms
info: bl done             | done - the test suite for bl version 1.2.1 passed.
info: starting            | spdy                
...
info: duration            | test duration: 134897ms
info: duplexer2 done      | done - the test suite for duplexer2 version 0.1.4 passed.
info: starting            | dicer               
...
info: duration            | test duration: 359697ms
info: coffee-script done  | done - the test suite for coffee-script version 1.12.6 passed.
info: starting            | spdy-transport      
...
info: duration            | test duration: 12042ms
info: dicer done          | done - the test suite for dicer version 0.2.5 passed.
info: starting            | sax                 
...
info: duration            | test duration: 66154ms
info: spdy-transport done | done - the test suite for spdy-transport version 2.0.18 passed.
info: starting            | duplexify           
...
info: duration            | test duration: 432977ms
info: underscore done     | done - the test suite for underscore version 1.8.3 passed.
info: starting            | pumpify             

and then

info: duration            | test duration: 462037ms
info: socket.io done      | done - the test suite for socket.io version 2.0.1 passed.
info: starting            | from2               

/cc @nodejs/build

MylesBorins commented 7 years ago

You can run citgm with a higher verbosity level. That may help

refack commented 7 years ago

silly > verbose (i.e. more verbose)?

MylesBorins commented 7 years ago

yup

refack commented 7 years ago

BTW: CITGM@silly X NPM@info gives a good forensic trace https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/810/nodes=aix61-ppc64/

But silly X silly overflows the memory on some machines: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/809/nodes=aix61-ppc64/

warn: pumpify npm-install:| sill pickManifestFromRegistryMetadata spec >=2.0.0 <3.0.0 tag latest versions [ '0.0.1',
warn:                     | npm sill pickManifestFromRegistryMetadata   '0.0.2',                                    
warn:                     | npm sill pickManifestFromRegistryMetadata   '0.0.3',                                    
warn:                     | npm sill pickManifestFromRegistryMetadata   '0.0.4',                                    
warn:                     | npm sill pickManifestFromRegistryMetadata   '0.1.0',                                    

<--- Last few GCs --->

[18940058:110a331b0]  9086418 ms: Mark-sweep 1410.3 (1946.3) -> 1410.3 (1946.3) MB, 2102.9 / 0.0 ms  allocation failure GC in old space requested
[18940058:110a331b0]  9088979 ms: Mark-sweep 1410.3 (1946.3) -> 1410.3 (1896.8) MB, 2561.1 / 0.0 ms  last resort 
[18940058:110a331b0]  9091037 ms: Mark-sweep 1410.3 (1896.8) -> 1410.3 (1885.3) MB, 2057.6 / 0.0 ms  last resort 

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x7000000206a9891 <JS Object>
    1: DoJoin(aka DoJoin) [native array.js:~96] [pc=700000000de17ec](this=0x700000020582311 <undefined>,p=0x70000007a173cb1 <JS Array[48890]>,q=48890,E=0x7000000205823b1 <true>,A=0x7000000206c8319 <String[1]\: \n>,z=0x700000020582421 <false>)
    2: Join(aka Join) [native array.js:~121] [pc=700000000cde890](this=0x700000020582311 <undefined>,p=0x70000007a173cb1 <JS Array[48890]>,q=48890,A=0x...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
/tmp/hudson4063506058421814237.sh[26]: 18940058 IOT/Abort trap
Build step 'Conditional steps (multiple)' marked build as failure
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?
Notifying upstream projects of job completion
Finished: FAILURE
gibfahn commented 7 years ago

But silly X silly overflows the memory on some machines:

We should just be able to increase the node process heap size, there's plenty of RAM on the machines.

MylesBorins commented 7 years ago

I'm going to close this for now as the original concern has been handled. Please open another issue to discuss increasing ram size in CI if you want to explore that