Closed roytan883 closed 6 years ago
Each step duration
looks good, only a few ms
. But why the final apiGateway.rest
count duration
a long time:
INFO -> metrics traceStart: { id: '1e9f1a84-85eb-466c-9aae-223abf74acac',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 1,
startTime: 1530602761235,
remoteCall: false,
action: { name: 'apiGateway.rest' },
nodeID: 'rts-apiGateway-dev-88-0' }
INFO -> metrics traceStart: { id: '6ba3c7bc-a194-43c2-8f34-34380de6d60c',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 2,
startTime: 1530602761271,
remoteCall: true,
meta: {},
action: { name: 'auth.authorize' },
parent: '1e9f1a84-85eb-466c-9aae-223abf74acac',
nodeID: 'rts-auth-dev-88-0',
callerNodeID: 'rts-apiGateway-dev-88-0' }
INFO -> metrics traceFinish: { id: '6ba3c7bc-a194-43c2-8f34-34380de6d60c',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 2,
startTime: 1530602761271,
endTime: 1530602761273.025,
duration: 2.024912,
remoteCall: true,
fromCache: false,
meta: {},
action: { name: 'auth.authorize' },
parent: '1e9f1a84-85eb-466c-9aae-223abf74acac',
nodeID: 'rts-auth-dev-88-0',
callerNodeID: 'rts-apiGateway-dev-88-0' }
INFO -> metrics traceStart: { id: '6c079ee3-0f1a-45fc-8c9d-fc2f5bd23ce4',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 2,
startTime: 1530602761314,
remoteCall: true,
meta: {},
action: { name: 'demo.welcome' },
parent: '1e9f1a84-85eb-466c-9aae-223abf74acac',
nodeID: 'rts-demo-dev-88-0',
callerNodeID: 'rts-apiGateway-dev-88-0' }
INFO -> metrics traceStart: { id: '47682a4f-6c45-4c98-9130-9372dfecb748',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 3,
startTime: 1530602761314,
remoteCall: false,
meta: {},
action: { name: 'demo.hello' },
parent: '6c079ee3-0f1a-45fc-8c9d-fc2f5bd23ce4',
nodeID: 'rts-demo-dev-88-0' }
INFO -> metrics traceFinish: { id: '47682a4f-6c45-4c98-9130-9372dfecb748',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 3,
startTime: 1530602761314,
endTime: 1530602761315.0862,
duration: 1.086225,
remoteCall: false,
fromCache: false,
meta: {},
action: { name: 'demo.hello' },
parent: '6c079ee3-0f1a-45fc-8c9d-fc2f5bd23ce4',
nodeID: 'rts-demo-dev-88-0' }
INFO -> metrics traceFinish: { id: '6c079ee3-0f1a-45fc-8c9d-fc2f5bd23ce4',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 2,
startTime: 1530602761314,
endTime: 1530602761315.9263,
duration: 1.92624,
remoteCall: true,
fromCache: false,
meta: {},
action: { name: 'demo.welcome' },
parent: '1e9f1a84-85eb-466c-9aae-223abf74acac',
nodeID: 'rts-demo-dev-88-0',
callerNodeID: 'rts-apiGateway-dev-88-0' }
INFO -> metrics traceFinish: { id: '1e9f1a84-85eb-466c-9aae-223abf74acac',
requestID: '1e9f1a84-85eb-466c-9aae-223abf74acac',
level: 1,
startTime: 1530602761235,
endTime: 1530602761354.5188,
duration: 119.518744,
remoteCall: false,
fromCache: false,
action: { name: 'apiGateway.rest' },
nodeID: 'rts-apiGateway-dev-88-0' }
Which versions?
Btw, please use issue templates when you open a new issue.
I've just measured this demo with jMeter:
With metrics: false
: ~10k req/sec
With metrics: true
: ~8k req/sec
I think i found the problem. if i start the process which listen those "metrics" event(metrics.trace.span.start/finish), then whole system slow to "120ms" each request(the event handler is very simple, i even try empty function). if i stop the listen process, the whole system back to normal "10ms" each request.
version:0.12.8
And the listener is on the same node, or in a remote node?
a remote node, every service in this issue are running on a separate node process
Ok, I will investigate it.
by the way, transporter is "nats"
What I measured: API gateway with metrics without listener: 8k req/sec API gateway with metrics with a remote listener: 3k req/sec
Btw, it was just a simple "test.hello" action.
it was just a simple "test.hello" action. a remote listener: 3k req/sec
yes, that's the problem. my scenario has apiGateway.rest -> auth.authorize->demo.welcome->demo.hello
, more remote actions, much slower.
Maybe you should test its latency?
If you can make a repro repo.
you just make request like this:
client -> apiGateway(test.hello)
-> test.hello
-> ctx.call test1.hello1
-> ctx.call test2.hello2
place test/test1/test2 in separate node, and another node listen metrics
yes, I can create a repo. But do you need that ? you already measured that remote node listen metrics
lead performance badly slower.
It's reasonable that metric function needs extra performance which increases the response time. Unfortunately, in your case, it's a huge difference because every action call generates 2 extra events what need to be transferred to another node which causes overhead. The bad news that I can't fix it currently because it is not a bug it's a design issue.
In the version 0.14, I will rewrite the whole metrics & tracing logic to be more efficient and pluggable. Meanwhile, as a workaround, you can disable this built-in metric function and write a middleware which tracing action calls and send them to a central server (bypassing the moleculer message bus)
Maybe its not about the metric
, i dig the source code, the metrics
just call broker.emit
. Then i make a test for event emit. Not listen (metrics.trace.span.start/finish).
In listener node:
events: {
// "metrics.trace.span.start": this.traceStart,
// "metrics.trace.span.finish": this.traceFinish,
"event.test.t1": this.t1,
"event.test.t2": this.t2,
},
async t1(info) {
this.logger.info(`${this.name} t1: `, util.inspect(info));
}
async t2(info) {
this.logger.info(`${this.name} t2: `, util.inspect(info));
}
in auth:
async authorize(ctx) {
this.broker.emit("event.test.t1", {
a: "abc",
b: 123,
c: true
})
console.log("call authorize ctx.params = ", ctx.params)
if (ctx.params.action.indexOf('$node') >= 0) { //disable call $node.*
return false
}
return true
}
in demo:
async welcome(ctx) {
this.broker.emit("event.test.t2", {
a: "abc",
b: 123,
c: true
})
// await ctx.call('demo.hello')
return this._sayWelcome(ctx.params.name);
}
the result is:
1, if not send event.test.t1
and event.test.t2
, use 8.396ms
2, if send event.test.t1
, use 51.806ms
2, if send event.test.t1
and event.test.t2
, use 81.931ms
simple event take 30~40ms
, doesn't make sence, event
should not much slower than action
I guess the problem is not about how to generate metrics. It is about how to send event without impact action performance.
Hmm, it's weird. Please create a dev example file with 3 brokers & services & NATS like here and I will try to investigate it.
OK, i will create a dev example file
OK, finished the result is:
call demo1.hello1 at: 1530627615760
call demo2.hello2 at: 1530627615761
call demo3.hello3 at: 1530627615762
call demo1.hello1 use time = 6
call demo1.hello1WithEvent at: 1530627615765
listen t1 at: 1530627615765
call demo2.hello2WithEvent at: 1530627615804
listen t2 at: 1530627615805
call demo3.hello3WithEvent at: 1530627615844
listen t3 at: 1530627615844
call demo1.hello1WithEvent use time = 120
test code is:
const { ServiceBroker } = require("moleculer");
let natsTransporter = {
type: "NATS",
options: {
url: "nats://127.0.0.1:4222",
}
}
// Create broker Client
const brokerClient = new ServiceBroker({
namespace: "streaming",
nodeID: "client",
transporter: natsTransporter,
logger: console,
logLevel: "info"
});
// Create broker #1
const broker1 = new ServiceBroker({
namespace: "streaming",
nodeID: "demo1",
transporter: natsTransporter,
logger: console,
logLevel: "info"
});
// Create broker #2
const broker2 = new ServiceBroker({
namespace: "streaming",
nodeID: "demo2",
transporter: natsTransporter,
logger: console,
logLevel: "info"
});
// Create broker #3
const broker3 = new ServiceBroker({
namespace: "streaming",
nodeID: "demo3",
transporter: natsTransporter,
logger: console,
logLevel: "info"
});
// Create brokerListen
const brokerListen = new ServiceBroker({
namespace: "streaming",
nodeID: "listen",
transporter: natsTransporter,
logger: console,
logLevel: "info"
});
broker1.createService({
name: "demo1",
actions: {
hello1(ctx) {
console.log("call demo1.hello1 at: ", Date.now())
return ctx.call('demo2.hello2')
},
hello1WithEvent(ctx) {
broker1.emit('t1', { a: 'abc', b: 123, c: true })
console.log("call demo1.hello1WithEvent at: ", Date.now())
return ctx.call('demo2.hello2WithEvent')
}
}
});
broker2.createService({
name: "demo2",
actions: {
hello2(ctx) {
console.log("call demo2.hello2 at: ", Date.now())
return ctx.call('demo3.hello3')
},
hello2WithEvent(ctx) {
broker2.emit('t2', { a: 'abc', b: 123, c: true })
console.log("call demo2.hello2WithEvent at: ", Date.now())
return ctx.call('demo3.hello3WithEvent')
}
}
});
broker3.createService({
name: "demo3",
actions: {
hello3(ctx) {
console.log("call demo3.hello3 at: ", Date.now())
return "hello from demo3"
},
hello3WithEvent(ctx) {
broker3.emit('t3', { a: 'abc', b: 123, c: true })
console.log("call demo3.hello3WithEvent at: ", Date.now())
return "hello from hello3WithEvent"
}
}
});
brokerListen.createService({
name: "listen",
events: {
t1: (ctx) => {
console.log("listen t1 at: ", Date.now())
},
t2: (ctx) => {
console.log("listen t2 at: ", Date.now())
},
t3: (ctx) => {
console.log("listen t3 at: ", Date.now())
},
}
});
brokerClient.Promise.all([brokerClient.start(), broker1.start(), broker2.start(), broker3.start(), brokerListen.start()])
.delay(2000)
.then(async () => {
let startTime = Date.now()
await brokerClient.call('demo1.hello1')
let endTime = Date.now()
console.log("call demo1.hello1 use time = ", endTime - startTime)
startTime = Date.now()
await brokerClient.call('demo1.hello1WithEvent')
endTime = Date.now()
console.log("call demo1.hello1WithEvent use time = ", endTime - startTime)
// startTime = Date.now()
// await brokerClient.call('demo1.hello1WithEvent')
// endTime = Date.now()
// console.log("call demo1.hello1WithEvent again use time = ", endTime - startTime)
});
Thanks, I will investigate it tonight.
It's weirder. It's the result on my PC:
call demo1.hello1 at: 1530633313582
call demo2.hello2 at: 1530633313585
call demo3.hello3 at: 1530633313587
call demo1.hello1 use time = 17
call demo1.hello1WithEvent at: 1530633313596
listen t1 at: 1530633313599
call demo2.hello2WithEvent at: 1530633313601
listen t2 at: 1530633313603
call demo3.hello3WithEvent at: 1530633313604
listen t3 at: 1530633313606
call demo1.hello1WithEvent use time = 15
Node version, OS, architecture...etc?
I could reproduce it on Ubuntu
call demo1.hello1 at: 1530644013498
call demo2.hello2 at: 1530644013501
call demo3.hello3 at: 1530644013503
call demo1.hello1 use time = 17
call demo1.hello1WithEvent at: 1530644013513
listen t1 at: 1530644013514
call demo2.hello2WithEvent at: 1530644013560
listen t2 at: 1530644013561
call demo3.hello3WithEvent at: 1530644013604
listen t3 at: 1530644013604
call demo1.hello1WithEvent use time = 139
It's very mystic! With NATS: ~8 rps (~125ms) With Redis: ~500 rps (~2ms) With TCP: ~1700 rps (~0.5ms)
And it's only on Linux. On Windows it's ~1500-2000 rps with all transporters. Do you have any idea? Maybe NATS client problem because it can send messages parallel?
Dev test file: https://github.com/moleculerjs/moleculer/blob/master/dev/emit-latency.js
have you try different nats client library version?
I will, but currently I can't.
I tried with 0.8, 0.7...same result.
Same thing here.
call demo1.hello1 at: 1530693198252
call demo2.hello2 at: 1530693198252
call demo3.hello3 at: 1530693198253
call demo1.hello1 use time = 8
call demo1.hello1WithEvent at: 1530693198255
listen t1 at: 1530693198255
call demo2.hello2WithEvent at: 1530693198303
listen t2 at: 1530693198303
call demo3.hello3WithEvent at: 1530693198348
listen t3 at: 1530693198349
call demo1.hello1WithEvent use time = 138
Info about my system:
Kubuntu 18.04 64bit Kernel 4.15.0-24 CPU: Intel i7 6700 RAM 16GB
NodeJS: 8.11.3 Nats Server: 1.1.0
Thanks @AndreMaz
After long time digging and testing, i found the reason. I don't know its NATS
or Nodejs
bug. The Nodejs said noDelay defaults to true
, but its not.
socket.setNoDelay([noDelay]) Disables the Nagle algorithm. By default TCP connections use the Nagle algorithm, they buffer data before sending it off. Setting true for noDelay will immediately fire off data each time socket.write() is called. noDelay defaults to true.
Just modify this file at: node_modules/nats/lib/nats.js
// Create the stream
this.stream = net.createConnection(this.url.port, this.url.hostname);
this.stream.setNoDelay(true) // need call it, somehow default 'true' is not work
After manually add setNoDelay(true)
, it works fine on ubuntu:
call demo1.hello1 at: 1530870151475
call demo2.hello2 at: 1530870151477
call demo3.hello3 at: 1530870151478
call demo1.hello1 use time = 8
call demo1.hello1WithEvent startTime = 1530870151480
call demo1.hello1WithEvent at: 1530870151481
listen e1 at: 1530870151482
call demo2.hello2WithEvent at: 1530870151483
listen e2 at: 1530870151484
call demo3.hello3WithEvent at: 1530870151484
listen e3 at: 1530870151485
call demo1.hello1WithEvent endTime = 1530870151486
call demo1.hello1WithEvent use time = 6
test environment:
moleculer: 0.12.8
nodejs: 8.11.1
node-nats: 1.0.0
gnatsd: 1.2.0
system: ubuntu 16.04 server
Wow, nice catch! It can explain why has the difference between OSs. So on Windows, the noDelay can be true
, and on Linux it's not.
And it's the reason why was not problem with Redis transporter: https://github.com/luin/ioredis/blob/622975d9d454eca6a9c495d35f5638d68e2662f2/lib/redis.js#L311
Btw, I can add a workaround, that I set the setNoDelay
always in the transporter code when the client connected.
@icebob Maybe you should also listen connect
and reconnect
events from NATS. Because if reconnect, the down layer stream(socket) object will be a new one.
client.on("connect", () => {
this.client = client;
if (client.stream)
client.stream.setNoDelay(true);
client.on("reconnect", () => {
this.client = client;
if (client.stream)
client.stream.setNoDelay(true);
Yes, good idea!
npm already has a fix for this, you may want to just update and remove the patching.
@aricart Thank you for fixing this
@roytan883 yw!!
The issue scenario is :
apiGateway.rest -> auth.authorize->demo.welcome->demo.hello
every step is very simple, should only consider network cost.If i enable
metrics: true
in each services process, use about120ms
total. Buf if i disablemetrics
, it only use about10ms
.metrics
internal use event broadcast ? Then is should not impact normal performance so badly.