michalkvasnicak / aws-lambda-graphql

Use AWS Lambda + AWS API Gateway v2 for GraphQL subscriptions over WebSocket and AWS API Gateway v1 for HTTP
https://chat-example-app.netlify.com
MIT License
459 stars 92 forks source link

[Discussion] Different event store implementations #58

Open AlpacaGoesCrazy opened 4 years ago

AlpacaGoesCrazy commented 4 years ago

Concerns about dynamoDB event store implementation, such as event publisher lambda trigger on each event occurring in events table, made me think about different approaches.

I came up with two options first one SNS+SQS and the second ElastiCache with Redis Pub/Sub. For both of this options there is a problem on how to communicate published events back to publisher lambda. For SQS there is an option to set a lambda trigger, but it seems that you can not set it dynamically for newly created queues. For Redis Pub/Sub it seems there is no option but to have a continuously running server which listens to Redis publish events.

My question is if you know how to solve this problem or maybe there are more possible event store implementations to consider.

michalkvasnicak commented 4 years ago

@AlpacaGoesCrazy I think that @guerrerocarlos mentioned Kinesis as an event store few months ago.

With DynamoDB streams the problem is if you have low traffic meaning that there is only 1 event in a batch. DynamoDB stream can be batched but then you'd need to have for example 100 and more events per second.

Another way is to replace DynamoDB with SQS that is batched too (without SNS) https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

I think this will be limitation of going serverless that you are somehow limited to how to approach event processing. For example you can replace event processing with EC2 instance you just need to feed the event processor with a simulated event from event source.

// example of SQS long polling EC2 server

// event store should be able to run on lambda to register connections
// and subscriptions
const processEvents = createMyEventProcessor({
   connectionManager, // can be dynamodb connection manager
   schema,
   subscriptionManager, // can be dynamodb subscription manager
});

while (true) {
   const { Messages } = await sqs.receiveMessage({
     QueueUrl: // your sqs events queue
     MaxNumberOfMessages: 10,
     WaitTimeSeconds: 20,
   }).promise();

   if (!Messages || Messages.length === 0) {
     continue;
   }

   // depends on events that event processor expects
   // let's say the event processor accepts an array of events 
   await processEvents(Messages.map(message => JSON.stringify(message.Body));
}
AlpacaGoesCrazy commented 4 years ago

Waiting for 100 events is not suitable for real time application, you can not even estimate message delivery estimate. And about using EC2 instance to process events, this might be contrary to main idea of this package: using graphql subscriptions on lambdas with no running instances.

michalkvasnicak commented 4 years ago

I don't have any idea how to solve that using event store because we're limited only to event source streams that AWS Lambda actually supports.

SQS and DynamoDB streams are batched by default, so Lambda is already working with batches but the number of events in batch depends on how many events were written to DynamoDB until AWS Lambda polls for changes (from documentation Lambda polls shards in your DynamoDB stream for records at a base rate of 4 times per second. When records are available, Lambda invokes your function and waits for the result. If processing succeeds, Lambda resumes polling until it receives more records.).

You can set up lambda for bigger throughput using partitions and shards. But I'm no AWS expert so maybe there are other patterns that could be incorporated that are outside of this package's scope.

michalkvasnicak commented 4 years ago

For example you can invoke your lambda manually using API so that's another way how to implement different event sourcing that is not directly supported by AWS.

eduard-malakhov commented 4 years ago

@AlpacaGoesCrazy, I've been thinking about the same issue lately and I picked up on your phrase

For both of this options there is a problem on how to communicate published events back to publisher lambda.

Why do we need published events to be communicated back to the publisher? Can you elaborate on this?

AlpacaGoesCrazy commented 4 years ago

@eduard-malakhov The architecture idea of this project is to communicate subscription message from one source to multiple subscribers in a fan-out fashion. The publisher lambda in dynamoDB implementation is responsible for listening to incoming messages (in ddb Events table) via ddb streams and then sending them to each subscriber. Thus it is important to communicate published events from the event source (mutation where we call pubSub.publish) to the publisher.

eduard-malakhov commented 4 years ago

@AlpacaGoesCrazy, I see, but why is this not possible with SNS/SQS? The approach that I had in mind was to publish events/messages to SNS/SQS from mutation resolver and attach a lambda to that topic/queue to parse and fan-out these events/messages appropriately to subscribers. Basically, to replace the creation of event items in DDB with publishing to a topic/queue, and replace the subscription to a DDB event stream with a subscription to a topic/queue. Am I missing any caveats?

AlpacaGoesCrazy commented 4 years ago

@eduard-malakhov I think my original idea was to dynamically create an SQS queue per one subscriber for achieving message delivery guarantee, I am not sure if it will be possible with just one SQS queue for all topics/clients.

eduard-malakhov commented 4 years ago

@AlpacaGoesCrazy, now I see your point, thanks for sharing. I haven't thought this through entirely, but on the surface, I don't see any limitations for a single topic/queue to work, apart from throughput maybe. As a side note, we've been implementing a chat app based on kafka recently, and we had the same design decision: whether to allocate a topic per client or send all messages via a single topic.

Per our analysis, at least in our use case, a topic per client was a waste of resources because there were not enough messages delivered per client to justify the resources required to maintain a dedicated topic inside kafka. On the other hand, one topic was a single point of failure and could get overloaded under intensive usage. Finally, we ended up with assigning multiple clients per topic and scaling the number of topics to adjust to changing demand.