Closed rahulreddy closed 7 years ago
Can you reproduce in a stubbed unit test?
I am unable to reproduce this. I have tried both mocking an error from a server with non 200 http codes and ECONNRESET TCP errors.
Identified by @bennettbuchanan . This line could be a potential problem and tough to reproduce. https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L94
The theory is if this.bootstrap.shift() happens, then some other request might see the this.bootstrap.length to be 0, thereby satisfying the condition
if (++counter >= this.bootstrap.length) {
This would be hard to reproduce.
I have a theory, a little bit crazy but why not:
if for some reason, the size https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L180 is less than expected then sproxyd will get the stream, and respond with a status code then the log here https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L165 call the end log but the stream has not be totally consumed, then trigger an error the stream is pipe to the request, so the request will emit an error and we go here https://github.com/scality/sproxydclient/blob/GA6.2.12/lib/sproxyd.js#L157
Interesting theory. Any update @alexandre-merle, anyone?
Can you reproduce by mocking ?
I have tried mocking it with the above theory and few of mine. I was unable to reproduce it. The only observation I can tell from the code is that it would happen only on a GET or a PUT, since the code path it originates from doesn't cover DELETEs.
I have reproduce the error, my theory is good :)
ps: that means, at some point, we try to send more than the size to sproxyd or, even worst, we send an incorrect object size
ps2: No need to mock up, i used a real local sproxyd, just a custom client
zsh 139 [130] (git)-[rel/6.2.5]-% node test
{"name":"SproxydClient","method":"PUT","value":"[data]","args":{"bucketName":"test","owner":"user","namespace":"scality"},"counter":0,"time":1485078416164,"req_id":"test-request:8d4dd55e8943877d8b79","level":"info","message":"request","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","error":null,"time":1485078416179,"req_id":"test-request:8d4dd55e8943877d8b79","elapsed_ms":15.424145,"level":"info","message":"request received response","hostname":"atom-Latitude-E7450","pid":25589}
5DF9124C6E64CEC9A13D11C28F6BCD590B2DEE70
{"name":"SproxydClient","method":"PUT","value":"[data]","args":{"bucketName":"test","owner":"user","namespace":"scality"},"counter":0,"time":1485078416164,"req_id":"test-request:8d4dd55e8943877d8b79","level":"info","message":"request","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","newKey":"5DF9124C6E64CEC9A13D11C28F6BCD590B2DEE70","time":1485078416179,"req_id":"test-request:8d4dd55e8943877d8b79","level":"debug","message":"stored to sproxyd","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","error":null,"time":1485078416179,"req_id":"test-request:8d4dd55e8943877d8b79","elapsed_ms":15.424145,"level":"info","message":"request received response","hostname":"atom-Latitude-E7450","pid":25589}
{"name":"SproxydClient","error":{"code":"ECONNRESET","errno":"ECONNRESET","syscall":"read"},"time":1485078417177,"req_id":"test-request:8d4dd55e8943877d8b79","level":"error","message":"PUT chunk to sproxyd","hostname":"atom-Latitude-E7450","pid":25589}
assert.js:85
throw new assert.AssertionError({
^
AssertionError: The "end()" logging method should not be called more than once.
at RequestLogger.errorEnd (/home/atom/scality/sproxydclient/node_modules/werelogs/lib/RequestLogger.js:392:16)
at /home/atom/scality/sproxydclient/lib/sproxyd.js:157:25
at ClientRequest.<anonymous> (/home/atom/scality/sproxydclient/lib/sproxyd.js:184:28)
at emitOne (events.js:82:20)
at ClientRequest.emit (events.js:169:7)
at Socket.socketErrorListener (_http_client.js:269:9)
at emitOne (events.js:77:13)
at Socket.emit (events.js:169:7)
at emitErrorNT (net.js:1269:8)
at nextTickCallbackWith2Args (node.js:458:9)
To be specific, this error append when we try to write in the stream when sproxyd already send back a response
here the client reproducing the bug :
'use strict';
const sproxyd = require('./lib/sproxyd');
const Transform = require('stream').Transform;
class stream extends Transform {
_transform(buffer, encoding, callback) {
callback(null, buffer);
}
}
const client = new sproxyd({
bootstrap:[
'localhost:8181',
],
});
const s = new stream();
const buf = Buffer.alloc(2048, '0');
client.put(s, 2048, {
bucketName: 'test',
owner: 'user',
namespace: 'scality',
}, 'test-request', (err, res) => {
if (err) {
console.log(err);
process.exit(1);
}
console.log(res);
});
s.write(buf);
setTimeout(() => s.write(buf), 1000);
Nice! So, if the client sends a PUT with incorrect content-length then sproxyd would end up responding twice which leads us to this crash. Since we don't mess with neither the headers nor the stream, it's probably safe to assume the client originating the request is at fault and we should fix this issue so that response callbacks don't fire twice.
This whole issue raises a question for me: I had set an assert to prevent double-end-logging, considering this as an important enough coding error. Do you think it's okay to keep this as-is, or should we change the strategy (error/fatal logging instead, or something along the lines ?) I'm personally mixed about this, as a crash is something we cannot miss, as opposed to simple logging (whatever the logging level is).
Note: It's also not the first time that this assert reveals such an issue
@DavidPineauScality keeping the assert is good, i'm totally for it, since a crash is a lot more visible than a simple log, for more, in that case, crash is all we can do, since the client already have a 200 response, otherwise, the log line will just disapear in the all thing
fix commit merged and forward ported
The first request before a failover occurs calls
log.end
If an error occurs, failover logic is used which if then reaches the end of the bootstrap, it would call logger.errorEnd. Werelogs has an assertion if the end logger is called multiple times, so it throws an exception crashing the process.