Driver blocking node event loop

dgottlie commented 12 years ago

When using the driver to connect to mysql over our network, the node event loop blocks until all of the results are returned. This does not occur when using the instance of mysql running on my local machine or with the JDBC mysql driver over the network so it seems like there’s some interaction between the driver and our network.

I've posted the test code I’m using below, but am not really sure about the next steps to debug the problem so any suggestions would be appreciated.

mySql  = require('mysql')
conn  =  mySql.createConnection 
    host         : ''
    port         : '4930'
    user         : ''
    password     : ''
    database     : ''
    insecureAuth : true

conn.connect()
conn.query  "SELECT * from users", (err, rows) =>
    if err
        console.error err.message
    else
        delta = new Date() - last
        console.error('EVENT LOOP BLOCKED FOR', delta, "ms")
        console.log rows.length
    throw "done"
conn.end()

last = new Date()
setInterval( ->
    delta = new Date() - last
    if (delta > 2)
        console.error('EVENT LOOP BLOCKED FOR', delta, "ms")
    last = new Date()
, 1)

felixge commented 12 years ago

Can you provide the output this program produces for you? Could you also set debug: true in your connection settings?

node-mysql should never block the event loop except for the time it takes to parse a chunk of incoming mysql server data (64kb, should be < 1 ms).

--fg

dgottlie commented 12 years ago

Thanks - here's the program output interspersed with the debug information:

<-- HandshakeInitializationPacket
{ protocolVersion: 10,
  serverVersion: '5.0.80-enterprise-nt-log',
  threadId: 47828,
  scrambleBuff1: <Buffer 7b 7e 2a 51 39 67 7c 5b>,
  filler1: <Buffer 00>,
  serverCapabilities1: 41516,
  serverLanguage: 8,
  serverStatus: 2,
  serverCapabilities2: 0,
  scrambleLength: 0,
  filler2: <Buffer 00 00 00 00 00 00 00 00 00 00>,
  scrambleBuff2: <Buffer 33 2c 22 63 4e 3d 29 50 39 6d 3a 79>,
  filler3: <Buffer 00>,
  pluginData: undefined }

--> ClientAuthenticationPacket
{ clientFlags: 193487,
  maxPacketSize: 0,
  charsetNumber: 33,
  filler: undefined,
  user: 'xx',
  scrambleBuff: <Buffer 3a 09 49 19 15 53 85 43 7d 3b fa 76 38 86 21 ac 54 da de 4f>,
  database: 'yy' }

<-- UseOldPasswordPacket
{ firstByte: 254 }

--> OldPasswordPacket
{ scrambleBuff: <Buffer 4d 55 59 59 59 52 47 47> }

<-- OkPacket
{ fieldCount: 0,
  affectedRows: 0,
  insertId: 0,
  serverStatus: 2,
  warningCount: 0,
  message: '' }

--> ComQueryPacket
{ command: 3,
  sql: 'SELECT * FROM users' }

<-- ResultSetHeaderPacket
{ fieldCount: 252, extra: undefined }

<-- FieldPacket
{ catalog: 'def',
  db: 'yy',
  table: 'users',
  orgTable: 'users',
  name: 'uid',
  orgName: 'uid',
  filler1: <Buffer 0c>,
  charsetNr: 63,
  length: undefined,
  type: 3,
  flags: 16899,
  decimals: 0,
  filler2: <Buffer 00 00>,
  default: undefined,
  fieldLength: 11 }

... 24 Field Packets...

EVENT LOOP BLOCKED FOR 26 ms

... 228 FieldPackets ...
... 35 RowDataPackets...

EVENT LOOP BLOCKED FOR 130 ms

... 22,441 RowDataPackets...

<-- EofPacket
{ fieldCount: 254, warningCount: 0, serverStatus: 34 }

EVENT LOOP BLOCKED FOR 24520 ms

done

felixge commented 12 years ago

Ok, so you're fetching 22,441 rows with 252 columns each (dude, wtf?), correct?

That's 5.6 million columns. Parsing those in 24 second isn't bad : ). And yes, the driver will block while parsing this data - that's the expected behavior.

Use the streaming API if you don't want this to happen: https://github.com/felixge/node-mysql#streaming-query-rows

hugorodrigues commented 12 years ago

lol

felixge commented 12 years ago

Actually, it seems like queued network data blocks timers from firing in node. That's a little annoying, but not much I can do about it. File a bug with node.js if you want.

Still: Use the streaming API for fetching such massive result sets.

mscdex commented 12 years ago

There was a discussion about this issue in general somewhat recently on the nodejs mailing list. It's mainly to do with the gc and very large objects like this.

You can try disabling idle gc with node --nouse-idle-notification foo.js, but there's no guarantee it will alleviate the pausing much.

felixge commented 12 years ago

@mscdex I don't think this is GC related. It simply seems like libuv will always fire network events before any timers. IMO it should be the other way around. Could be plattform specific.

@dgottlie can you run node --trace_gc your-script.js and paste the output again? Also what OS are you on?

dgottlie commented 12 years ago

Hi,

Thanks for looking into this! I know five million fields is a little crazy, but it was easier to debug the problem when it was blocking for 20 seconds rather than 1 or 2 :).

I’m experiencing the same issue with the stream api (even if I only handle the end event), so unfortunately that doesn’t seem to be a workaround. I am running on a windows 7 x64 server (government agency so I have limited choices), so it could be windows specific. Here’s the gc output:

EVENT LOOP BLOCKED FOR 12 ms
EVENT LOOP BLOCKED FOR 7 ms
EVENT LOOP BLOCKED FOR 14 ms
EVENT LOOP BLOCKED FOR 24 ms
EVENT LOOP BLOCKED FOR 31 ms
EVENT LOOP BLOCKED FOR 11 ms
EVENT LOOP BLOCKED FOR 26 ms
      34 ms: Scavenge 2.1 (35.0) -> 1.8 (36.0) MB, 1 ms [Runtime::PerformGC].
      86 ms: Scavenge 3.0 (37.0) -> 2.5 (37.0) MB, 1 / 1 ms [allocation failure].
     118 ms: Scavenge 3.4 (37.0) -> 2.9 (37.0) MB, 1 ms [allocation failure].
     174 ms: Scavenge 5.2 (39.0) -> 3.9 (40.0) MB, 0 ms [allocation failure].
     197 ms: Scavenge 5.6 (40.0) -> 4.9 (40.0) MB, 1 ms [Runtime::PerformGC].
     212 ms: Mark-sweep 6.4 (40.0) -> 4.9 (41.0) MB, 7 ms [allocation failure] [promotion limit reached].
     233 ms: Scavenge 8.9 (41.0) -> 7.1 (42.0) MB, 2 ms [allocation failure].
     249 ms: Scavenge 10.1 (42.0) -> 8.8 (44.0) MB, 2 ms [Runtime::PerformGC].
     264 ms: Scavenge 11.8 (44.0) -> 10.5 (45.0) MB, 1 ms [allocation failure].
     307 ms: Scavenge 16.7 (45.0) -> 14.0 (49.0) MB, 5 ms (+ 4 ms in 63 steps since last GC) [Runtime::PerformGC].
     344 ms: Scavenge 19.9 (49.0) -> 17.3 (52.0) MB, 4 ms (+ 1 ms in 61 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
     374 ms: Mark-sweep 21.7 (52.0) -> 19.6 (56.0) MB, 7 ms (+ 7 ms in 168 steps since start of marking, biggest step 1.000000 ms) [StackGuard GC request] [GC in old space requested].
     454 ms: Scavenge 35.4 (56.0) -> 28.5 (62.0) MB, 9 ms [allocation failure].
     515 ms: Scavenge 40.4 (62.0) -> 35.1 (68.0) MB, 9 ms [allocation failure].
     590 ms: Scavenge 47.0 (68.0) -> 41.8 (75.0) MB, 10 ms (+ 8 ms in 118 steps since last GC) [Runtime::PerformGC].
     662 ms: Scavenge 53.7 (75.0) -> 48.5 (82.0) MB, 9 ms (+ 12 ms in 122 steps since last GC) [allocation failure].
     734 ms: Scavenge 60.4 (82.0) -> 55.2 (89.0) MB, 9 ms (+ 12 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
     807 ms: Scavenge 67.1 (89.0) -> 61.9 (96.0) MB, 9 ms (+ 9 ms in 125 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
     883 ms: Mark-sweep 73.7 (96.0) -> 68.2 (108.0) MB, 11 ms (+ 56 ms in 606 steps since start of marking, biggest step 1.000000 ms) [allocation failure] [promotion limit reached].
     958 ms: Scavenge 84.0 (108.0) -> 77.1 (111.0) MB, 1 / 9 ms [allocation failure].
    1019 ms: Scavenge 89.0 (111.0) -> 83.8 (118.0) MB, 10 ms [allocation failure].
    1079 ms: Scavenge 95.6 (118.0) -> 90.4 (125.0) MB, 8 ms [allocation failure].
    1139 ms: Scavenge 102.3 (125.0) -> 97.1 (132.0) MB, 9 ms [allocation failure].
    1212 ms: Scavenge 109.0 (132.0) -> 103.8 (139.0) MB, 9 ms (+ 15 ms in 123 steps since last GC) [allocation failure].
    1283 ms: Scavenge 115.7 (139.0) -> 110.5 (145.0) MB, 9 ms (+ 14 ms in 125 steps since last GC) [allocation failure].
    1354 ms: Scavenge 122.4 (145.0) -> 117.2 (152.0) MB, 9 ms (+ 12 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    1425 ms: Scavenge 129.0 (152.0) -> 123.8 (159.0) MB, 10 ms (+ 11 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    1498 ms: Scavenge 135.7 (159.0) -> 130.5 (166.0) MB, 1 / 10 ms (+ 10 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    1569 ms: Scavenge 142.4 (166.0) -> 137.2 (173.0) MB, 9 ms (+ 5 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    1642 ms: Scavenge 149.0 (173.0) -> 143.8 (180.0) MB, 1 / 10 ms (+ 6 ms in 126 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    1713 ms: Scavenge 155.7 (180.0) -> 150.5 (187.0) MB, 9 ms (+ 12 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
Speed up marking after 1024 steps
Marking speed increased to 3
    1793 ms: Scavenge 162.3 (187.0) -> 157.2 (193.0) MB, 9 ms (+ 16 ms in 119 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    1842 ms: Mark-sweep 163.4 (193.0) -> 159.6 (202.0) MB, 8 ms (+ 116 ms in 1160 steps since start of marking, biggest step 1.000000 ms) [StackGuard GC request] [GC in old space requested].
    1921 ms: Scavenge 175.4 (2EVENT LOOP BLOCKED FOR 3105 ms
EVENT LOOP BLOCKED FOR 21 ms
02.0) -> 168.4 (206.0) MB, 11 ms [allocation failure].
    1980 ms: Scavenge 180.3 (206.0) -> 175.1 (213.0) MB, 9 ms [allocation failure].
    2038 ms: Scavenge 186.9 (213.0) -> 181.7 (219.0) MB, 9 ms [allocation failure].
    2097 ms: Scavenge 193.6 (219.0) -> 188.3 (226.0) MB, 9 ms [allocation failure].
    2155 ms: Scavenge 200.2 (226.0) -> 195.0 (233.0) MB, 9 ms [allocation failure].
    2214 ms: Scavenge 206.8 (233.0) -> 201.6 (239.0) MB, 9 ms [allocation failure].
    2273 ms: Scavenge 213.4 (239.0) -> 208.2 (246.0) MB, 9 ms [Runtime::PerformGC].
    2332 ms: Scavenge 220.1 (246.0) -> 214.8 (253.0) MB, 9 ms [allocation failure].
    2410 ms: Scavenge 226.7 (253.0) -> 221.5 (260.0) MB, 9 ms (+ 12 ms in 118 steps since last GC) [allocation failure].
    2479 ms: Scavenge 233.3 (260.0) -> 228.2 (267.0) MB, 9 ms (+ 13 ms in 115 steps since last GC) [allocation failure].
    2551 ms: Scavenge 240.1 (267.0) -> 234.9 (274.0) MB, 9 ms (+ 10 ms in 119 steps since last GC) [Runtime::PerformGC].
    2622 ms: Scavenge 246.7 (274.0) -> 241.5 (280.0) MB, 9 ms (+ 11 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    2693 ms: Scavenge 253.4 (280.0) -> 248.1 (288.0) MB, 9 ms (+ 7 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    2765 ms: Scavenge 260.0 (288.0) -> 254.8 (294.0) MB, 10 ms (+ 5 ms in 122 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    2836 ms: Scavenge 266.6 (294.0) -> 261.4 (301.0) MB, 9 ms (+ 7 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    2909 ms: Scavenge 273.3 (301.0) -> 268.0 (308.0) MB, 9 ms (+ 10 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
Speed up marking after 1024 steps
Marking speed increased to 3
    2988 ms: Scavenge 279.9 (308.0) -> 274.7 (315.0) MB, 10 ms (+ 14 ms in 119 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    3077 ms: Scavenge 286.5 (316.0) -> 281.3 (323.0) MB, 9 ms (+ 26 ms in 119 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    3166 ms: Scavenge 293.2 (323.0) -> 287.9 (330.0) MB, 9 ms (+ 30 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    3255 ms: Scavenge 299.8 (330.0) -> 294.6 (336.0) MB, 9 ms (+ 23 ms in 122 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    3344 ms: Scavenge 306.4 (336.0) -> 301.2 (344.0) MB, 9 ms (+ 31 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    3442 ms: Mark-sweep 313.1 (344.0) -> 306.3 (355.0) MB, 1 / 12 ms (+ 227 ms in 1677 steps since start of marking, biggest step 1.000000 ms) [Runtime::PerformGC] [promotion limit reached].
    3523 ms: Scavenge 322.2 (355.0) -> 315.3 (359.0) MB, 14 ms [allocation failure].
    3582 ms: Scavenge 327.1 (359.0) -> 321.9 (365.0) MB, 9 ms [allocation failure].
    3641 ms: Scavenge 333.8 (365.0) -> 328.5 (372.0) MB, 1 / 9 ms [allocation failure].
    3700 ms: Scavenge 340.4 (372.0) -> 335.2 (379.0) MB, 9 ms [allocation failure].
    3756 ms: Scavenge 347.0 (379.0) -> 342.0 (386.0) MB, 9 ms [allocation failure].
    3816 ms: Scavenge 353.9 (386.0) -> 348.7 (393.0) MB, 1 / 9 ms [allocation failure].
    3874 ms: Scavenge 360.5 (393.0) -> 355.3 (399.0) MB, 9 ms [allocation failure].
    3937 ms: Scavenge 367.1 (399.0) -> 361.9 (405.0) MB, 9 ms [allocation failure].
    3996 ms: Scavenge 373.8 (405.0) -> 368.6 (413.0) MB, 9 ms [allocation failure].
    4055 ms: Scavenge 380.4 (413.0) -> 375.2 (419.0) MB, 9 ms [allocation failure].
    4113 ms: Scavenge 387.1 (419.0) -> 381.8 (426.0) MB, 9 ms [allocation failure].
    4175 ms: Scavenge 393.7 (426.0) -> 388.5 (433.0) MB, 9 ms [allocation failure].
    4235 ms: Scavenge 400.3 (433.0) -> 395.1 (439.0) MB, 9 ms [allocation failure].
    4295 ms: Scavenge 406.9 (439.0) -> 401.7 (446.0) MB, 10 ms [allocation failure].
    4354 ms: Scavenge 413.6 (446.0) -> 408.3 (453.0) MB, 1 / 10 ms [Runtime::PerformGC].
    4412 ms: Scavenge 420.2 (453.0) -> 415.0 (459.0) MB, 9 ms [allocation failure].
    4504 ms: Scavenge 426.8 (459.0) -> 421.6 (466.0) MB, 10 ms (+ 10 ms in 120 steps since last GC) [allocation failure].
    4576 ms: Scavenge 433.5 (466.0) -> 428.2 (473.0) MB, 1 / 10 ms (+ 5 ms in 118 steps since last GC) [allocation failure].
    4648 ms: Scavenge 440.1 (473.0) -> 434.9 (480.0) MB, 10 ms (+ 9 ms in 121 steps since last GC) [Runtime::PerformGC] [incremental marking delaying mark-sweep].
    4721 ms: Scavenge 446.7 (480.0) -> 441.5 (487.0) MB, 10 ms (+ 7 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    4793 ms: Scavenge 453.4 (487.0) -> 448.1 (493.0) MB, 1 / 10 ms (+ 7 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    4865 ms: Scavenge 460.0 (493.0) -> 454.8 (501.0) MB, 9 ms (+ 12 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    4938 ms: Scavenge 466.6 (501.0) -> 461.4 (507.0) MB, 1 / 10 ms (+ 10 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5010 ms: Scavenge 473.3 (507.0) -> 468.0 (514.0) MB, 10 ms (+ 11 ms in 121 steps since last GC) [Runtime::PerformGC] [incremental marking delaying mark-sweep].
Speed up marking after 1024 steps
Marking speed increased to 3
    5090 ms: Scavenge 479.9 (514.0) -> 474.7 (521.0) MB, 9 ms (+ 15 ms in 122 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5178 ms: Scavenge 486.5 (521.0) -> 481.3 (528.0) MB, 9 ms (+ 23 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5267 ms: Scavenge 493.2 (528.0) -> 487.9 (535.0) MB, 1 / 10 ms (+ 33 ms in 119 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5355 ms: Scavenge 499.8 (535.0) -> 494.6 (542.0) MB, 10 ms (+ 23 ms in 119 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5443 ms: Scavenge 506.4 (542.0) -> 501.2 (548.0) MB, 9 ms (+ 24 ms in 122 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5524 ms: Scavenge 513.1 (548.0) -> 507.8 (556.0) MB, 9 ms (+ 27 ms in 104 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5615 ms: Scavenge 519.6 (556.0) -> 514.4 (562.0) MB, 11 ms (+ 39 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5704 ms: Scavenge 526.3 (562.0) -> 521.0 (569.0) MB, 9 ms (+ 25 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    5792 ms: Scavenge 532.9 (569.0) -> 527.7 (576.0) MB, 9 ms (+ 35 ms in 122 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
Speed up marking after 1024 steps
Marking speed increased to 6
    5902 ms: Mark-sweep 539.5 (576.0) -> 531.8 (587.0) MB, 1 / 12 ms (+ 359 ms in 2149 steps since start of marking, biggest step 1.000000 ms) [allocation failure] [promotion limit reached].
    5989 ms: Scavenge 547.6 (587.0) -> 540.7 (591.0) MB, 20 ms [allocation failure].
    6048 ms: Scavenge 552.6 (591.0) -> 547.4 (597.0) MB, 9 ms [allocation failure].
    6108 ms: Scavenge 559.2 (597.0) -> 554.0 (605.0) MB, 9 ms [Runtime::PerformGC].
    6167 ms: Scavenge 565.9 (605.0) -> 560.6 (611.0) MB, 10 ms [Runtime::PerformGC].
    6226 ms: Scavenge 572.5 (611.0) -> 567.3 (617.0) MB, 9 ms [allocation failure].
    6285 ms: Scavenge 579.1 (617.0) -> 573.9 (625.0) MB, 9 ms [allocation failure].
    6345 ms: Scavenge 585.7 (625.0) -> 580.5 (631.0) MB, 10 ms [allocation failure].
    6404 ms: Scavenge 592.4 (631.0) -> 587.1 (637.0) MB, 9 ms [allocation failure].
    6463 ms: Scavenge 599.0 (637.0) -> 593.7 (645.0) MB, 9 ms [allocation failure].
    6523 ms: Scavenge 605.6 (645.0) -> 600.4 (651.0) MB, 1 / 10 ms [allocation failure].
    6581 ms: Scavenge 612.2 (651.0) -> 607.0 (658.0) MB, 9 ms [allocat119561
EVENT LOOP BLOCKED FOR 5001 ms
done
ion failure].
    6640 ms: Scavenge 618.9 (658.0) -> 613.6 (665.0) MB, 9 ms [allocation failure].
    6700 ms: Scavenge 625.5 (665.0) -> 620.2 (671.0) MB, 9 ms [allocation failure].
    6760 ms: Scavenge 632.1 (671.0) -> 626.9 (678.0) MB, 9 ms [allocation failure].
    6819 ms: Scavenge 638.7 (678.0) -> 633.5 (684.0) MB, 10 ms [Runtime::PerformGC].
    6878 ms: Scavenge 645.3 (684.0) -> 640.1 (691.0) MB, 9 ms [allocation failure].
    6937 ms: Scavenge 652.0 (691.0) -> 646.7 (698.0) MB, 9 ms [allocation failure].
    6996 ms: Scavenge 658.6 (698.0) -> 653.3 (704.0) MB, 10 ms [Runtime::PerformGC].
    7055 ms: Scavenge 665.2 (704.0) -> 660.0 (711.0) MB, 10 ms [allocation failure].
    7114 ms: Scavenge 671.8 (711.0) -> 666.6 (718.0) MB, 10 ms [allocation failure].
    7173 ms: Scavenge 678.4 (718.0) -> 673.2 (724.0) MB, 1 / 10 ms [allocation failure].
    7233 ms: Scavenge 685.1 (724.0) -> 679.8 (732.0) MB, 9 ms [allocation failure].
    7293 ms: Scavenge 691.7 (732.0) -> 686.4 (738.0) MB, 10 ms [allocation failure].
    7351 ms: Scavenge 698.3 (738.0) -> 693.1 (744.0) MB, 9 ms [allocation failure].
    7410 ms: Scavenge 704.9 (744.0) -> 699.7 (751.0) MB, 10 ms [Runtime::PerformGC].
    7469 ms: Scavenge 711.6 (751.0) -> 706.3 (758.0) MB, 10 ms [allocation failure].
    7527 ms: Scavenge 718.2 (758.0) -> 712.9 (765.0) MB, 9 ms [allocation failure].
    7658 ms: Scavenge 724.8 (765.0) -> 719.6 (771.0) MB, 1 / 12 ms (+ 8 ms in 120 steps since last GC) [allocation failure].
    7738 ms: Scavenge 731.4 (771.0) -> 726.2 (778.0) MB, 11 ms (+ 11 ms in 119 steps since last GC) [allocation failure].
    7812 ms: Scavenge 738.1 (778.0) -> 732.8 (785.0) MB, 11 ms (+ 9 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    7884 ms: Scavenge 744.7 (785.0) -> 739.4 (792.0) MB, 1 / 10 ms (+ 6 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    7955 ms: Scavenge 751.3 (792.0) -> 746.1 (799.0) MB, 10 ms (+ 11 ms in 120 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    8032 ms: Scavenge 759.2 (800.3) -> 754.0 (807.3) MB, 10 ms (+ 12 ms in 122 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    8104 ms: Scavenge 765.9 (807.3) -> 760.6 (813.3) MB, 10 ms (+ 11 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    8177 ms: Scavenge 772.5 (813.3) -> 767.2 (820.3) MB, 1 / 10 ms (+ 10 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
Speed up marking after 1024 steps
Marking speed increased to 3
    8258 ms: Scavenge 779.1 (821.3) -> 773.9 (828.3) MB, 10 ms (+ 18 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    8347 ms: Scavenge 785.7 (828.3) -> 780.5 (835.3) MB, 10 ms (+ 29 ms in 121 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].

felixge commented 12 years ago

Interesting. Can you show your code, how you use the stream API? Are you passing a callback to the query() method? If yes, then streaming won't be enabled.

dgottlie commented 12 years ago

Here's the stream API code I tried:

last = new Date()
setInterval( ->
    delta = new Date() - last
    if (delta > 2)
        console.error('EVENT LOOP BLOCKED FOR', delta, "ms")
    last = new Date(); 
, 1)

conn.connect()
query = conn.query "SELECT * FROM users"

query.on "end", ->
    delta = new Date() - last
    conn.end()
    throw "done"

and here's the program output:

EVENT LOOP BLOCKED FOR 11 ms
EVENT LOOP BLOCKED FOR 3 ms
EVENT LOOP BLOCKED FOR 16 ms
EVENT LOOP BLOCKED FOR 15 ms
EVENT LOOP BLOCKED FOR 35 ms
EVENT LOOP BLOCKED FOR 3728 ms
EVENT LOOP BLOCKED FOR 12 ms
EVENT LOOP BLOCKED FOR 25 ms
EVENT LOOP BLOCKED FOR 2889 ms
done

avnersorek commented 9 years ago

I'm having the same issue Detailed here - http://stackoverflow.com/questions/30886173/nodejs-mysql-query-blocking-event-loop I've got a much smaller table - ~115K rows I'd appreciate any advice thanks

dougwilson commented 9 years ago

Hi @avnersorek ! It would be really neat if someone could put together a flamegraph of what Node.js is spending it's time doing during your query, however 115k rows is still a lot of rows, and I feel like a lot of those blocks may be in the v8 garbage collector or similar while we are trying to construct a 115k element array for you.

avnersorek commented 9 years ago

Hi @dougwilson ! Thanks for your help. Besides this being an application issue for me, I'd also love to take this opportunity and learn a little more about the inside workings of node, so I'd really appreciate any explanation regarding the matter.

I changed the query to SELECT SLEEP(5); and it really does not block at all, so I guess that implies the blocking happens when parsing the results.

I change my code to this, in order to keep the query running and generate the flame graph (also change to the streaming API) :

function runQuery() {
  console.time('query');
  connection.query(query)
  .on('end', function() {
    console.timeEnd('query');
    runQuery();
  });
}

connection.connect(function(err) {
  runQuery();
});

:octocat: won't let me attach an SVG to this comment to I've uploaded the flamegraph here : https://drive.google.com/file/d/0B1p0cQY4eBDhczlETUhJMUZkU0U/view?usp=sharing

Thanks for the help !!!

dougwilson commented 9 years ago

Thank you so much, @avnersorek ! So according to the flamegraph, most of the time is taking up by v8 setting properties on object (i.e. within v8 itself from us trying to create all those row objects). The function in the list from us that is actually taking any significant time is the typeCast function, so we can look into if/how that can be optimized, but it won't really help too much, according to the graph.

logidelic commented 4 years ago

I appreciate that this is an old thread by now, but since it's still Open, I guess this is still the case? I've been trying to figure out why I'm seeing blocks in my own Node server, such as the below. Is it plausible that these are caused by what is being discussed here?

Thanks much.

ERROR: BURST:blocked - Blocked for 27.6ms
 operation started here:
    at Socket.connect (net.js:950:7)
    at Object.connect (net.js:178:17)
    at PoolConnection.connect (/home/user/myproj/node/node_modules/mysql/lib/Connection.js:79:13)
    at Pool.getConnection (/home/user/myproj/node/node_modules/mysql/lib/Pool.js:48:16)
    at Pool.query (/home/user/myproj/node/node_modules/mysql/lib/Pool.js:202:8)
    at MySQLStore.query (/home/user/myproj/node/node_modules/express-mysql-session/index.js:432:33)
    at MySQLStore.get (/home/user/myproj/node/node_modules/express-mysql-session/index.js:162:8)
    at session (/home/user/myproj/node/node_modules/express-session/index.js:474:11)
    at Layer.handle [as handle_request] (/home/user/myproj/node/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (/home/user/myproj/node/node_modules/express/lib/router/index.js:317:13)
    at /home/user/myproj/node/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/home/user/myproj/node/node_modules/express/lib/router/index.js:335:12)
    at next (/home/user/myproj/node/node_modules/express/lib/router/index.js:275:10)
    ...

felixge commented 4 years ago

I'm no longer involved with this project, but decided to take a quick look since it seems interesting.

As far as I can tell, my theory from 2012 is probably correct and there seems to be a scheduling issue in the node core, as demonstrated by the code here:

https://gist.github.com/felixge/d16ee6b128af7256862bf83fe2f34d8d#file-blocked-by-loop-js https://gist.github.com/felixge/d16ee6b128af7256862bf83fe2f34d8d#file-blocked-by-net-client-txt

Firing a 100 million loop iteration every 1s seems to block the event loop for ~60ms. That seems reasonable.

Firing the same loop in a busy socket's 'data' callback however blocks the event loop for up to 1800ms. This seems unreasonable because the network callback gets scheduled 30x in a row while the timer has to wait.

So if your application requires fair scheduling between network and timer events, I'd suggest you raise an issue with the node.js core to get their feedback on this.

It might also be possible to implement some sort of cooperative scheduling in node-mysql by yielding back to the event loop after a certain amount of time. But if that's what it takes to get reasonable scheduling in node, I'd suggest to ditch it for Go 🙈.

logidelic commented 4 years ago

Fascinating, thank you @felixge . I agree that this is crazy, but I wonder what other Node DB implementations are doing and if they are subject to the same gotcha.

If I'm following you, wrapping the socket's data callback in a setImmediate would probably "fix" the problem. I'm going to give it a try and will report back. I admit that this would not be ideal (for performance reasons), but I think that there should at least be an optional way to work around this problem...

Thanks again.

felixge commented 4 years ago

@logidelic yeah, they should help. I.e you should be able to see better timer latencies in exchange of slightly lower overall throughput.

dougwilson commented 4 years ago

Unless you're talking about a different issue in this thread from the OP, I think the provided flame graphs are pretty straight forward for where the bottleneck is, so I don't think that is going to help. The blocking is still going to happen, even if it's moved with a setImmediate to occur at a different time...

felixge commented 4 years ago

@dougwilson are you sure?

I just modified my example to use socket.pause() and setImmediate and was able to reduce the timer starvation from ~1800ms at a time to ~80ms:

https://gist.github.com/felixge/d16ee6b128af7256862bf83fe2f34d8d#file-blocked-by-net-js

vs

https://gist.github.com/felixge/d16ee6b128af7256862bf83fe2f34d8d#file-blocked-by-net-improved-js

The flamegraph obviously shows why the callback is eating a lot of CPU time, but it doesn't show scheduling issues. By using setImmediate we're able to coerce the scheduler from it's default behavior of executing queued I/O events up to the internal callback limit to instead schedule evenly between I/O and timer events.

See https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ (via brmichel)

dougwilson commented 4 years ago

Nice! It would be very interesting to know if this impacts the wall clock of large query results, but fixing scheduling lock ups is more important than raw throughput for sure. Hopefully it doesn't cause the query parsing time as a total against the wall clock much impact.

felixge commented 4 years ago

Nice! It would be very interesting to know if this impacts the wall clock of large query results, but fixing scheduling lock ups is more important than raw throughput for sure. Hopefully it doesn't cause the query parsing time as a total against the wall clock much impact.

Yeah, that could be a concern.

One could combat this by having each connection instance setup a setInterval of e.g. 1ms which updates a this._lastTimer property. If this._lastTimer gets too far behind (e.g. 10ms), then the socket callback can detect it and call socket.pause() and setImmediate() to yield its scheduler time to other events that have been queued. This way one could implement (a perhaps configurable) tradeoff between scheduling latency and throughput.

That being said, one should certainly question the wisdom of implementing an I/O scheduler inside of a database client library. It'd be much better to seek solutions for this in the node core ...

logidelic commented 4 years ago

According to Node peeps, we shouldn't be blocking things in that callback. The suggestion is that the parsing should be moved to a worker thread.

dougwilson commented 4 years ago

My understanding is that workers threads are not made for parsing, either. This is because they run in a separate v8 isolate, so all the parsing work done there would not be available back on the main thread without stringifying the data and parsing it back in the main thread to turn it back into object, which would block again. Please correct me if I am mistaken, though.

felixge commented 4 years ago

@logidelic I replied to the node thread. I think node has a scheduling problem and the suggested workaround seems unreasonable for many scenarios, including node-mysql. Let’s wait for the discussion to continue ...

logidelic commented 4 years ago

Thanks for that @felixge . I have a feeling that they are going to stick to the, "this is by design and not our problem" line but we shall see.

I haven't looked at the internals of the Socket stuff in node, but I'm scared that the pause/resume stuff might be a hit on throughput. One alternative is to do something like:

var queue = [];

socket.on('data', data => {
    queue.push(data);
    if(queue.length == 1) setImmediate(processData);
})

function processData() {
    var data = queue.shift();
    if(queue.length) setImmediate(processData);
    // Parse/process data now...
}

This way we are processing each data event in sequential ticks (which is probably what most users would expect would happen)... Of course it may use a bit more memory for the queue.

felixge commented 4 years ago

@logidelic any sort of attempt to implement user-land scheduling will be a tradeoff between latency and throughput.

That being said, I suspect your proposal will have a higher overhead than mine (as every callback gets deferred, even if the event loop hasn't be blocked much), and also seems to allow unbound memory usage which might could cause significant issues for people dealing with very large query results.

mysqljs / mysql

Driver blocking node event loop #233