michalkvasnicak / aws-lambda-graphql

Use AWS Lambda + AWS API Gateway v2 for GraphQL subscriptions over WebSocket and AWS API Gateway v1 for HTTP
https://chat-example-app.netlify.com
MIT License
459 stars 92 forks source link

Question on scalability and data retrieval limits/cost of Dynamo #72

Closed fridaystreet closed 4 years ago

fridaystreet commented 4 years ago

Hi,

Thanks for putting this library together it looks really great. I haven't had a chance to fully test it out yet, but currently implementing it. This isn't an issue, more a question or request for advice. In pretty much every apollo subscription tutorial or example, they are always so rudimentary it's hard to think about how they might apply to a real application and also how they might perform at scale.

Going through the source to see how things hang together and thinking about how subscriptions should be implemented in particular the amount of data having to be read from Dynamo on every event. If event names are so simplistic as per all the examples around the web, if we just had a 'newMessage' event, is my understanding correct that if we had 20,000 users subscribed to this event, that this library would essentially need to fetch 20,000 records from Dynamo every time, even if the there was a withFilter of eg payload.topic === variables.topic.

Am I correct in thinking it would have to fetch all of the subscribers before processing the filter as the variables requested by the client are stored in the subscription record?

So what happens if that is thousands of records. I noticed the recent added limit to number of records retrieved from dynamo per request. Do you think this could start to cause issues at scale? Also processing large number of subscriptions to an event in the same lambda instance, could that have an effect on a realtime application? Maybe batching and invoking lambdas in parallel to process the responses.

So my question is, in this sort of scenario (ie serverless, external pubsub) in your experience and given the current functionality should we be taking a different approach to the withFilter pattern and looking to bake some of the filtering elements into the event name, 'newMessage::team:some_team_topic:cars'. How have you gone about it in any more production like systems?

I noticed the event name is stored as the hashkey, with Dynamos ability to filter on the range key with operators like beginsWith and lte, gte, between maybe some future support for specifying a range key in the event name passed from graphql could give an almost redis like feature set to this library ('newMessage::everything_passed_the_colons_is_range_key').

Being able to the subscribe just to eg all messages for all topics in a team by using

beginsWith('newMessage::team:some_team')

Any advice much appreciated.

Cheers Paul

AlpacaGoesCrazy commented 4 years ago

I agree that dynamoDB might not be the best fit for this job. Especially if you take into account that dynamoDB stream is triggered not only on INSERT operations but on any other as well (if you want to clean up events table), which results in unnecessary invocations. However this library is modular and nothing stops you to provide your own implementations of ConnectionManager, EventStore, etc. which can use Redis, SQS or something else

michalkvasnicak commented 4 years ago

@fridaystreet yes as @AlpacaGoesCrazy mentioned this library has very simple DynamoDB implementation (which is not very suited for real production use for large apps) but you can easily create your own ConnectionManager, EventStore etc by implementing interfaces:

Implementing these interfaces you should be able to implement more optimised DynamoDB store with that uses beginsWith or maybe Kinesis streams or something else :)

If you have such implementation you can create a PR that adds it to this library or create subpackage in this repository or just publish your own package and I will add a link to README (if you want to make it open for everyone else).

I'd appreciate any help with adding more performant data sources to this library so it could be more production ready and developer friendly to set the project easily.

fridaystreet commented 4 years ago

@michalkvasnicak Thanks for the response. I was actually still using the previous version, I upgraded yesterday and it's taken me a good day to get my head around the changes. But I can see now the new version and implementation is really awesome, well done on the new design and architecture, makes a lot more sense.

I'll have a look at those implementations and will certainly contribute prs back with anything I come up with. I've done a fair bit of work with Dynamo and Kinesis for event pipelines so yeah maybe there is something in that to look at. While redis and others are possibly better solutions, I'm really trying to keep as serverless as possible.

Dynamo can definitely do the job, I've previously built a realtime IoT analysis engine purely in kinesis and dynamo and it was extremely performant. Once you work out the right keys and indexes and as long as it's a fairly fixed scenario you're trying to solve, it's a powerful tool and not servers or shards etc to manage. This particular use case I think fits right in that niche so I'll have a crack at looking at how the data could be structured to get the best performance/cost.

Cheers

michalkvasnicak commented 4 years ago

@fridaystreet thank you very much for your kind words :) I'd really appreciate any information, examples or solutions to improve this library.

IslamWahid commented 4 years ago

Hello guys, I think I am experiencing some issues related to this topic. I went with the base implementation as the example and used dynamo but after testing on staging I started to notice a big increase in the dynmoStreamHandler lambda duration, it even started to exceed my increased timeout (60 sec). as a quick solution I'm clearing the Subscriptions table this fixes the problem but still need to find a proper solution for this. I see you guys mentioned to implement my own ISubscriptionManager but wanted to know if someone has used this on production and what are your recommendations.