moscajs / aedes-persistence-redis

Aedes persistence, backed by redis
MIT License
20 stars 23 forks source link

Using aedes-persistence-redis with Aedes resulting in slow client connections. #10

Open rasalasantosh opened 7 years ago

rasalasantosh commented 7 years ago

When aedes-persistence-redis used as persistence with Aedes , client connections are very slow . For 20k client connections it was taking around 10 minutes.But without persistence configured,connections are very fast around 50 k connection in less than a minute.

Please find the code below used to run aedes server.

var redis = require('mqemitter-redis'); var aedesPersistenceRedis = require('aedes-persistence-redis');

var mq = redis({ port: 6379, host: '172.31.38.96', db: 12 });

var persistence = aedesPersistenceRedis({ port: 6379,
host: '172.31.38.96'

});

var aedes = require('aedes')({ mq:mq, persistence:persistence })

var server = require('net').createServer(aedes.handle) var port = 1883

server.listen(port, function () { console.log('server listening on port', port) })

mcollina commented 7 years ago

Can you upload a Node.js script to reproduce the problem?

rasalasantosh commented 7 years ago

I don't have Node.js script ,I have tried with java based client program. In program there is a while loop which creates new client and connects to server and after connection subscribe to a topic whose value is same as clientId.

mcollina commented 7 years ago

anything that can reproduce the problem is good, otherwise it will be extremely hard to fix.

mcollina commented 7 years ago

cc @GavinDmello

GavinDmello commented 7 years ago

Yea. This is an issue. We have a lot of anonymous functions in persistence which is preventing optimizations, I think. Also, ResultsHolder from fastSeries always seems to be un-optimized.

mcollina commented 7 years ago

@GavinDmello removing the anon functions should be feasible, do you want to send a PR? Mostly are forEach/Reduce etc, that needs to be changed into straight for loop.

ResultsHolder should be optimized if the rest is optimized. There might be a small issue somewhere in there too.

At any rate, we need a script to verify this problem is fixed.

GavinDmello commented 7 years ago

@mcollina Yea. That would be cool. I'll make a PR @rasalasantosh Could you share that java script ? Would be helpful.

rasalasantosh commented 7 years ago

@mcollina , kindly share the client script that you have used to test 900 k connections , so that i modify according to my use case and test.

mcollina commented 7 years ago

@rasalasantosh as said elsewhere, I did not do that test myself.

mcollina commented 7 years ago

@rasalasantosh can you try version v2.6.1 and see if it improves things?

rasalasantosh commented 7 years ago

@mcollina , is it redis v2.6.1?If so will try and let you know.

GavinDmello commented 7 years ago

@rasalasantosh v2.6.1 of this module

rasalasantosh commented 7 years ago

@GavinDmello , I installed aedes using npm i aedes aedes-persistence-redis@2.6.1 --save and tried again.

It will still taking around 11 minutes for 25k connections , but without persistence it was taking less than a minute for 50k connections.

And also tried with mosca with persistence ,it was taking around 3 minutes for 25k connections.

mcollina commented 7 years ago

@rasalasantosh are you connecting with what options? are you also subscribing to a topic?

rasalasantosh commented 7 years ago

@mcollina ,yes after connection successful ,iam subscribing to topic as well.

mcollina commented 7 years ago

@rasalasantosh you should check v2.6.3, it has lua script caching, which should improve the throughput.

rasalasantosh commented 7 years ago

@mcollina , I have tested again ,but giving the same results i.e. 11min for 25k connections.

mcollina commented 7 years ago

@rasalasantosh are you connecting with what options? are you also subscribing to a topic? From our test, it takes just 30 seconds to connect 20k connections, so maybe something else is happening.

rasalasantosh commented 7 years ago

@mcollina , iam connecting with clean session =true , QoS=1,keepalive - 60 sec .And also after connecting we will be subscribing to topic which is same as clientId.

And also please share the test script,so i can verify at my side also with your script.

GavinDmello commented 7 years ago

@rasalasantosh Could you try it with this ? https://gist.github.com/GavinDmello/dda612661df4db11ba84b41097470c95 It's not currently subscribing to a topic and clean is true.

rasalasantosh commented 7 years ago

@GavinDmello , Aedes is taking 142 sec to connect for 50k connections without subscriptions,but 1360 sec to connect with subscriptions. Tested the same with mosca ,it is taking 190 sec without subscriptions ,but 205 sec with subscriptions.

Please find the script below that is used to test subscriptions.

client-sub.txt

mcollina commented 7 years ago

@rasalasantosh with Redis configured as persistence, is it correct?

I think the createRetainedStream is where the problem is: https://github.com/mcollina/aedes-persistence-redis/blob/master/persistence.js#L98-L105. I'll try to send a PR to improve things

mcollina commented 7 years ago

@rasalasantosh check v2.6.4.

rasalasantosh commented 7 years ago

@mcollina with v2.6.4 getting same results.

GavinDmello commented 7 years ago

@rasalasantosh Try removing all unwanted keys from redis. Also, is redis working on the same box ? Maybe you're running out of memory.

rasalasantosh commented 7 years ago

@GavinDmello ,I tried after flushing redis,but getting same results . More over redis , Aedes and client are running in seperate instances.

mcollina commented 7 years ago

Mosca is using an hashtable for the retained messages, while we are just using keys and h here. https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L241-L267.

Maybe that's the approach we should take here as well. I thought using SCAN was smart

toddtreece commented 7 years ago

@mcollina i'm wondering if this is due to the lua script. we eventually run into issues in production once the subscription count hits a certain point. i missed this section in the redis docs when testing the lua changes: ...while the script is running no other client can execute commands.

from the redis docs:

Redis uses the same Lua interpreter to run all the commands. Also Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed. This semantic is similar to the one of MULTI / EXEC. From the point of view of all the other clients the effects of a script are either still not visible or already completed.

However this also means that executing slow scripts is not a good idea. It is not hard to create fast scripts, as the script overhead is very low, but if you are going to use slow scripts you should be aware that while the script is running no other client can execute commands.

mcollina commented 7 years ago

@toddtreece that is probably the reason. So, do you think we should revert https://github.com/mcollina/aedes-persistence-redis/pull/3?

toddtreece commented 7 years ago

@mcollina yeah, i think so.

GavinDmello commented 7 years ago

Would using scan stream guarantee ordered delivery ? I think we should try leveraging hashmaps.

GavinDmello commented 7 years ago

@mcollina I think instead of moving the entire thing to scan stream, we should probably try moving createRetainedStream and check for improvements.

mcollina commented 7 years ago

Yes, that's probably a good idea!

mcollina commented 7 years ago

@rasalasantosh @toddtreece would you mind testing out v2.6.6. It should be way more performant.

behrad commented 7 years ago

Very bad subscription performance confirmed! 👎 there are definitely some issues:

  1. storeSubscriptions looks to be called for each sub, but its using packet.subscriptions when calling .add: https://github.com/mcollina/aedes/blob/master/lib/handlers/subscribe.js#L63 this multiplies subscriptions!
behrad commented 7 years ago
  1. Why addSubscriptions is rpushing always? everytime on every resubscribe sub is appended to the list: https://github.com/mcollina/aedes-persistence-redis/blob/master/persistence.js#L137 I don't understand the rationale behind this list? mosca's way of using HSCAN was fully ok, wasn't it?
behrad commented 7 years ago

Third item is not a BUG, but also a redundant data structure:

  1. Why we are storing a reverse key for each sub? (i.e hset sub:client:topic clientId encoded) this keys are never used, they are only written when addSubscription and deleted when removeSubscription!!!

EDIT: getClientList is using this! is this really necessary?

behrad commented 7 years ago

There's no restoreSubscription after client connect, and the same handleSubscribe is called after connect to restoreSubscriptions, and when a client subscribes. So if a clean=false client subscribes after each of it's connects, aedes will run handleSubscribe twice. broker.emit('subscribe', packet.subscriptions, client) is also called twice !

However there is not clear abstraction between addSubscriptions and storeSubscriptions inside aedes-persistence[-redis]

behrad commented 7 years ago

Another issue is handleSubscribe is calling doSubscribe (https://github.com/mcollina/aedes/blob/master/lib/handlers/subscribe.js#L25) in series for each client subscription!! first I don't think there's a need for async series, second and most important: this leads underlying persistent implementation to store subscriptions 1-by-1, however it could be very more performant if redis persistent could store them all in one batch! (as mosca does)

behrad commented 7 years ago

I've playing and changing some code in aedes and aedes-persistence and even aedes-cache-persistence. Aedes should change some API abstractions to overcome this. Are you ok with this @mcollina ?

mcollina commented 7 years ago

I'm ok with changes, as long as they are ported to the other persistences as well.

mcollina commented 7 years ago

(sorry I was from mobile)

GavinDmello commented 7 years ago

@behrad How many subscriptions are dealing with ? I did not use any SCAN/HSCAN implementation of redis because, if count is not set to an optimum value the performance isn't good. Also the redis server was doing a lot of heavy lifting in terms of CPU , when many aedes instances are connected

behrad commented 7 years ago

How many subscriptions are dealing with ?

may be a few millions in production, however current implementation of aedes persistence [redis] will rise issues around a few hundred thousands.

I did not use any SCAN/HSCAN implementation of redis because

please check mosca's redis implementation here: https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L137

we can't set an upper limit for client subscription count

Also the redis server was doing a lot of heavy lifting in terms of CPU , when many servers are aedes instances are connected

This may be an issue with the core arch/design of a multi process/instance around a single shared redis model. I believe aedes-persistence-redis should support clustered redis internally (however we can test it against a transparent proxy (like twemproxy or codis))

Do you think your concern is the same as https://github.com/mcollina/mqemitter-child-process/issues/1 @GavinDmello ?

GavinDmello commented 7 years ago

I think you can fetch all keys in a list with a -1 arg. My concern is using SCAN in hot paths. The count 25000 which is present in mosca may be good enough for a few million keys but as the subscriptions increase the number will have to be bumped . Nutcracker can't be used with multi at the moment.

behrad commented 7 years ago

I think you can fetch all keys in a list with a -1 arg

You mean LRANGING a million sized list is faster than SCAN? if yes, how much faster? does it worth the redundancy (in large deployments redis memory is valuable) and why not a SET? why LIST? and there should be some dedup login inside using a list. currently it is duplicating subs on each handleSubscribe

The count 25000 which is present in mosca may be good enough for a few million keys but as the subscriptions increase the number will have to be bumped .

that is the chunk buffer size, and my tests show that should be lowered even to 5000 or lower.

Nutcracker can't be used with multi at the moment.

regarding multi, (multi is heavy for redis) What if we sacrifice a-bit of consistency for clusterability/performance and remove/option-lize multi? @mcollina We can at-least provide an configurable option for this since subscriptions are idempotent and subsequent client subscriptions can fix any inconsistencies.

mcollina commented 7 years ago

I'm 👍 on removing MULTI, I do not think it would cause any issue in practice.

GavinDmello commented 7 years ago

and why not a SET?

Set would've been a good option. I wasn't aware of the handleSubscribe issue

that is the chunk buffer size, and my tests show that should be lowered even to 5000 or lower.

I'm okay with this if nutcracker support is added.

GavinDmello commented 7 years ago

@behrad SCAN is not supported on Nutcracker as well.

https://github.com/twitter/twemproxy/blob/master/notes/redis.md

SSCAN, ZSCAN, HSCAN is supported though

behrad commented 7 years ago

SCAN is not supported on Nutcracker as well.

Unfortunately yes, and seems we SHOULD have a container for all subs/clients:

  1. SET is preferred since it does the dedup for topics
  2. For storage/memory optimization I think we should store only one index, either subs or clients. and calculate/implement needed abstract persistence interface methods.

currently, this module is storing both. e.g. a client's subscription is stored both inside client's hash, and also inside the topic hash (only clientId would suffice in the latter)

SSCAN, ZSCAN, HSCAN is supported though

How do you think SSCAN will compare to SMEMBERS for a large (over a million entry) set @GavinDmello ?