logger.end is called multiple times

rahulreddy commented 7 years ago

The first request before a failover occurs calls log.end

 _failover(method, stream, size, key, tries, log, callback, params) {
      ....
        this._handleRequest(method, stream, size, key, log, (err, ret) => {
            if (err && !err.isExpected) {
                if (++counter >= this.bootstrap.length) {
                    log.errorEnd('failover tried too many times, giving up',
                                 { retries: counter });
                    return callback(err);
                }
                return this._shiftCurrentBootstrapToEnd(log)
                    ._failover(method, stream, size, key, counter, log,
                               callback, params);
            }
            log.end('request received response', { error: err });
            return callback(err, ret);
        }, args);
    }

If an error occurs, failover logic is used which if then reaches the end of the bootstrap, it would call logger.errorEnd. Werelogs has an assertion if the end logger is called multiple times, so it throws an exception crashing the process.

{
    "name": "S3",
    "error": "The \"end()\" logging method should not be called more than once.",
    "stack": "AssertionError: The\ "end()\" logging method should not be called more than once.\n    at RequestLogger.
    errorEnd(/home/scality / S3 / node_modules / werelogs / lib / RequestLogger.js: 392: 16)\ n   at / home / scality / S
    3 / node_modules / sproxydclient / lib / sproxyd.js: 157: 25\ n   at ClientRequest. < anonymous > (/home/scality / S 3 / node_modules / sproxydclient / lib / sproxyd.js: 184: 28)\ n   at emitOne(events.js: 82: 20)\ n   at ClientR
    equest.emit(events.js: 169: 7)\ n   at Socket.socketErrorListener(_http_client.js: 269: 9)\ n   at emit
    One(events.js: 77: 13)\ n   at Socket.emit(events.js: 169: 7)\ n   at onwriteError(_stream_writable.js: 312: 10)\ n   at onwrite(_stream_writable.js: 330: 5)\ n   at WritableState.onwrite(_stream_writable.js: 89: 5)\ n   at fireErrorCallbacks(net.js: 456: 13)\ n   at Socket._destroy(net.js: 497: 3)\ n   at Wr
    iteWrap.afterWrite(net.js: 787: 10)
    ","
    level ":"
    fatal ","
    message ":"
    caught error ","
    hostname: "redacted"}

vrancurel commented 7 years ago

Can you reproduce in a stubbed unit test?

rahulreddy commented 7 years ago

I am unable to reproduce this. I have tried both mocking an error from a server with non 200 http codes and ECONNRESET TCP errors.

rahulreddy commented 7 years ago

Identified by @bennettbuchanan . This line could be a potential problem and tough to reproduce. https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L94

The theory is if this.bootstrap.shift() happens, then some other request might see the this.bootstrap.length to be 0, thereby satisfying the condition

if (++counter >= this.bootstrap.length) {

This would be hard to reproduce.

alexandre-merle commented 7 years ago

I have a theory, a little bit crazy but why not:

if for some reason, the size https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L180 is less than expected then sproxyd will get the stream, and respond with a status code then the log here https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L165 call the end log but the stream has not be totally consumed, then trigger an error the stream is pipe to the request, so the request will emit an error and we go here https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L157

GiorgioRegni commented 7 years ago

Interesting theory. Any update @alexandre-merle, anyone?

vrancurel commented 7 years ago

Can you reproduce by mocking ?

rahulreddy commented 7 years ago

I have tried mocking it with the above theory and few of mine. I was unable to reproduce it. The only observation I can tell from the code is that it would happen only on a GET or a PUT, since the code path it originates from doesn't cover DELETEs.

alexandre-merle commented 7 years ago

I have reproduce the error, my theory is good :)

ps: that means, at some point, we try to send more than the size to sproxyd or, even worst, we send an incorrect object size

ps2: No need to mock up, i used a real local sproxyd, just a custom client

zsh 139 [130]  (git)-[rel/6.2.5]-% node test
{"name":"SproxydClient","method":"PUT","value":"[data]","args":{"bucketName":"test","owner":"user","namespace":"scality"},"counter":0,"time":1485078416164,"req_id":"test-request:8d4dd55e8943877d8b79","level":"info","message":"request","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","error":null,"time":1485078416179,"req_id":"test-request:8d4dd55e8943877d8b79","elapsed_ms":15.424145,"level":"info","message":"request received response","hostname":"atom-Latitude-E7450","pid":25589}
5DF9124C6E64CEC9A13D11C28F6BCD590B2DEE70
{"name":"SproxydClient","method":"PUT","value":"[data]","args":{"bucketName":"test","owner":"user","namespace":"scality"},"counter":0,"time":1485078416164,"req_id":"test-request:8d4dd55e8943877d8b79","level":"info","message":"request","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","newKey":"5DF9124C6E64CEC9A13D11C28F6BCD590B2DEE70","time":1485078416179,"req_id":"test-request:8d4dd55e8943877d8b79","level":"debug","message":"stored to sproxyd","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","error":null,"time":1485078416179,"req_id":"test-request:8d4dd55e8943877d8b79","elapsed_ms":15.424145,"level":"info","message":"request received response","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","error":{"code":"ECONNRESET","errno":"ECONNRESET","syscall":"read"},"time":1485078417177,"req_id":"test-request:8d4dd55e8943877d8b79","level":"error","message":"PUT chunk to sproxyd","hostname":"atom-Latitude-E7450","pid":25589}

assert.js:85
  throw new assert.AssertionError({
  ^
AssertionError: The "end()" logging method should not be called more than once.
    at RequestLogger.errorEnd (/home/atom/scality/sproxydclient/node_modules/werelogs/lib/RequestLogger.js:392:16)
    at /home/atom/scality/sproxydclient/lib/sproxyd.js:157:25
    at ClientRequest.<anonymous> (/home/atom/scality/sproxydclient/lib/sproxyd.js:184:28)
    at emitOne (events.js:82:20)
    at ClientRequest.emit (events.js:169:7)
    at Socket.socketErrorListener (_http_client.js:269:9)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at emitErrorNT (net.js:1269:8)
    at nextTickCallbackWith2Args (node.js:458:9)

alexandre-merle commented 7 years ago

To be specific, this error append when we try to write in the stream when sproxyd already send back a response

alexandre-merle commented 7 years ago

here the client reproducing the bug :

'use strict';

const sproxyd = require('./lib/sproxyd');
const Transform = require('stream').Transform;

class stream extends Transform {

  _transform(buffer, encoding, callback) {
    callback(null, buffer);
  }

}

const client = new sproxyd({
  bootstrap:[
    'localhost:8181',
  ],
});

const s = new stream();

const buf = Buffer.alloc(2048, '0');

client.put(s, 2048, {
  bucketName: 'test',
  owner: 'user',
  namespace: 'scality',
}, 'test-request', (err, res) => {
  if (err) {
    console.log(err);
    process.exit(1);
  }
  console.log(res);
});

s.write(buf);
setTimeout(() => s.write(buf), 1000);

rahulreddy commented 7 years ago

Nice! So, if the client sends a PUT with incorrect content-length then sproxyd would end up responding twice which leads us to this crash. Since we don't mess with neither the headers nor the stream, it's probably safe to assume the client originating the request is at fault and we should fix this issue so that response callbacks don't fire twice.

ghost commented 7 years ago

This whole issue raises a question for me: I had set an assert to prevent double-end-logging, considering this as an important enough coding error. Do you think it's okay to keep this as-is, or should we change the strategy (error/fatal logging instead, or something along the lines ?) I'm personally mixed about this, as a crash is something we cannot miss, as opposed to simple logging (whatever the logging level is).

Note: It's also not the first time that this assert reveals such an issue

alexandre-merle commented 7 years ago

@DavidPineauScality keeping the assert is good, i'm totally for it, since a crash is a lot more visible than a simple log, for more, in that case, crash is all we can do, since the client already have a 200 response, otherwise, the log line will just disapear in the all thing

LaurenSpiegel commented 7 years ago

fix commit merged and forward ported

scality / sproxydclient

logger.end is called multiple times #109