Open rasalasantosh opened 7 years ago
Can you upload a Node.js script to reproduce the problem?
I don't have Node.js script ,I have tried with java based client program. In program there is a while loop which creates new client and connects to server and after connection subscribe to a topic whose value is same as clientId.
anything that can reproduce the problem is good, otherwise it will be extremely hard to fix.
cc @GavinDmello
Yea. This is an issue. We have a lot of anonymous functions in persistence which is preventing optimizations, I think. Also, ResultsHolder from fastSeries always seems to be un-optimized.
@GavinDmello removing the anon functions should be feasible, do you want to send a PR? Mostly are forEach/Reduce etc, that needs to be changed into straight for loop.
ResultsHolder should be optimized if the rest is optimized. There might be a small issue somewhere in there too.
At any rate, we need a script to verify this problem is fixed.
@mcollina Yea. That would be cool. I'll make a PR @rasalasantosh Could you share that java script ? Would be helpful.
@mcollina , kindly share the client script that you have used to test 900 k connections , so that i modify according to my use case and test.
@rasalasantosh as said elsewhere, I did not do that test myself.
@rasalasantosh can you try version v2.6.1 and see if it improves things?
@mcollina , is it redis v2.6.1?If so will try and let you know.
@rasalasantosh v2.6.1 of this module
@GavinDmello , I installed aedes using npm i aedes aedes-persistence-redis@2.6.1 --save and tried again.
It will still taking around 11 minutes for 25k connections , but without persistence it was taking less than a minute for 50k connections.
And also tried with mosca with persistence ,it was taking around 3 minutes for 25k connections.
@rasalasantosh are you connecting with what options? are you also subscribing to a topic?
@mcollina ,yes after connection successful ,iam subscribing to topic as well.
@rasalasantosh you should check v2.6.3, it has lua script caching, which should improve the throughput.
@mcollina , I have tested again ,but giving the same results i.e. 11min for 25k connections.
@rasalasantosh are you connecting with what options? are you also subscribing to a topic? From our test, it takes just 30 seconds to connect 20k connections, so maybe something else is happening.
@mcollina , iam connecting with clean session =true , QoS=1,keepalive - 60 sec .And also after connecting we will be subscribing to topic which is same as clientId.
And also please share the test script,so i can verify at my side also with your script.
@rasalasantosh Could you try it with this ? https://gist.github.com/GavinDmello/dda612661df4db11ba84b41097470c95 It's not currently subscribing to a topic and clean is true.
@GavinDmello , Aedes is taking 142 sec to connect for 50k connections without subscriptions,but 1360 sec to connect with subscriptions. Tested the same with mosca ,it is taking 190 sec without subscriptions ,but 205 sec with subscriptions.
Please find the script below that is used to test subscriptions.
@rasalasantosh with Redis configured as persistence, is it correct?
I think the createRetainedStream
is where the problem is: https://github.com/mcollina/aedes-persistence-redis/blob/master/persistence.js#L98-L105. I'll try to send a PR to improve things
@rasalasantosh check v2.6.4.
@mcollina with v2.6.4 getting same results.
@rasalasantosh Try removing all unwanted keys from redis. Also, is redis working on the same box ? Maybe you're running out of memory.
@GavinDmello ,I tried after flushing redis,but getting same results . More over redis , Aedes and client are running in seperate instances.
Mosca is using an hashtable for the retained messages, while we are just using keys and h here. https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L241-L267.
Maybe that's the approach we should take here as well. I thought using SCAN was smart
@mcollina i'm wondering if this is due to the lua script. we eventually run into issues in production once the subscription count hits a certain point. i missed this section in the redis docs when testing the lua changes: ...while the script is running no other client can execute commands.
from the redis docs:
Redis uses the same Lua interpreter to run all the commands. Also Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed. This semantic is similar to the one of MULTI / EXEC. From the point of view of all the other clients the effects of a script are either still not visible or already completed.
However this also means that executing slow scripts is not a good idea. It is not hard to create fast scripts, as the script overhead is very low, but if you are going to use slow scripts you should be aware that while the script is running no other client can execute commands.
@toddtreece that is probably the reason. So, do you think we should revert https://github.com/mcollina/aedes-persistence-redis/pull/3?
@mcollina yeah, i think so.
Would using scan stream guarantee ordered delivery ? I think we should try leveraging hashmaps.
@mcollina I think instead of moving the entire thing to scan stream, we should probably try moving createRetainedStream and check for improvements.
Yes, that's probably a good idea!
@rasalasantosh @toddtreece would you mind testing out v2.6.6. It should be way more performant.
Very bad subscription performance confirmed! 👎 there are definitely some issues:
.add
: https://github.com/mcollina/aedes/blob/master/lib/handlers/subscribe.js#L63 this multiplies subscriptions!addSubscriptions
is rpush
ing always? everytime on every resubscribe sub is appended to the list: https://github.com/mcollina/aedes-persistence-redis/blob/master/persistence.js#L137
I don't understand the rationale behind this list? mosca's way of using HSCAN was fully ok, wasn't it?Third item is not a BUG, but also a redundant data structure:
hset sub:client:topic clientId encoded
) this keys are never used, they are only written when addSubscription and deleted when removeSubscription!!!EDIT: getClientList
is using this! is this really necessary?
There's no restoreSubscription
after client connect, and the same handleSubscribe is called after connect to restoreSubscriptions, and when a client subscribes.
So if a clean=false
client subscribes after each of it's connects, aedes
will run handleSubscribe
twice.
broker.emit('subscribe', packet.subscriptions, client)
is also called twice !
However there is not clear abstraction between addSubscriptions
and storeSubscriptions
inside aedes-persistence[-redis]
Another issue is handleSubscribe
is calling doSubscribe
(https://github.com/mcollina/aedes/blob/master/lib/handlers/subscribe.js#L25) in series for each client subscription!!
first I don't think there's a need for async series, second and most important: this leads underlying persistent implementation to store subscriptions 1-by-1, however it could be very more performant if redis persistent could store them all in one batch! (as mosca does)
I've playing and changing some code in aedes
and aedes-persistence
and even aedes-cache-persistence
.
Aedes should change some API abstractions to overcome this. Are you ok with this @mcollina ?
I'm ok with changes, as long as they are ported to the other persistences as well.
(sorry I was from mobile)
@behrad How many subscriptions are dealing with ? I did not use any SCAN/HSCAN implementation of redis because, if count is not set to an optimum value the performance isn't good. Also the redis server was doing a lot of heavy lifting in terms of CPU , when many aedes instances are connected
How many subscriptions are dealing with ?
may be a few millions in production, however current implementation of aedes persistence [redis] will rise issues around a few hundred thousands.
I did not use any SCAN/HSCAN implementation of redis because
please check mosca's redis implementation here: https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L137
we can't set an upper limit for client subscription count
Also the redis server was doing a lot of heavy lifting in terms of CPU , when many servers are aedes instances are connected
This may be an issue with the core arch/design of a multi process/instance around a single shared redis model. I believe aedes-persistence-redis
should support clustered redis internally (however we can test it against a transparent proxy (like twemproxy or codis))
Do you think your concern is the same as https://github.com/mcollina/mqemitter-child-process/issues/1 @GavinDmello ?
I think you can fetch all keys in a list with a -1 arg. My concern is using SCAN in hot paths. The count 25000 which is present in mosca may be good enough for a few million keys but as the subscriptions increase the number will have to be bumped . Nutcracker can't be used with multi at the moment.
I think you can fetch all keys in a list with a -1 arg
You mean LRANGING
a million sized list is faster than SCAN?
if yes, how much faster? does it worth the redundancy (in large deployments redis memory is valuable)
and why not a SET? why LIST? and there should be some dedup
login inside using a list. currently it is duplicating subs on each handleSubscribe
The count 25000 which is present in mosca may be good enough for a few million keys but as the subscriptions increase the number will have to be bumped .
that is the chunk buffer size, and my tests show that should be lowered even to 5000 or lower.
Nutcracker can't be used with multi at the moment.
regarding multi
, (multi is heavy for redis) What if we sacrifice a-bit of consistency for clusterability/performance and remove/option-lize multi
? @mcollina
We can at-least provide an configurable option for this since subscriptions are idempotent and subsequent client subscriptions can fix any inconsistencies.
I'm 👍 on removing MULTI, I do not think it would cause any issue in practice.
and why not a SET?
Set would've been a good option. I wasn't aware of the handleSubscribe issue
that is the chunk buffer size, and my tests show that should be lowered even to 5000 or lower.
I'm okay with this if nutcracker support is added.
@behrad SCAN is not supported on Nutcracker as well.
https://github.com/twitter/twemproxy/blob/master/notes/redis.md
SSCAN, ZSCAN, HSCAN is supported though
SCAN is not supported on Nutcracker as well.
Unfortunately yes, and seems we SHOULD have a container for all subs/clients:
currently, this module is storing both. e.g. a client's subscription is stored both inside client's hash, and also inside the topic hash (only clientId would suffice in the latter)
SSCAN, ZSCAN, HSCAN is supported though
How do you think SSCAN will compare to SMEMBERS for a large (over a million entry) set @GavinDmello ?
When aedes-persistence-redis used as persistence with Aedes , client connections are very slow . For 20k client connections it was taking around 10 minutes.But without persistence configured,connections are very fast around 50 k connection in less than a minute.
Please find the code below used to run aedes server.
var redis = require('mqemitter-redis'); var aedesPersistenceRedis = require('aedes-persistence-redis');
var mq = redis({ port: 6379, host: '172.31.38.96', db: 12 });
var persistence = aedesPersistenceRedis({ port: 6379,
host: '172.31.38.96'
});
var aedes = require('aedes')({ mq:mq, persistence:persistence })
var server = require('net').createServer(aedes.handle) var port = 1883
server.listen(port, function () { console.log('server listening on port', port) })