imapi / spark-sqs-receiver

Spark SQS Amazon queue receiver
24 stars 11 forks source link

Reliable SQS Receiver #1

Open cizmazia opened 9 years ago

cizmazia commented 9 years ago

I would like to have a Spark Streaming SQS Receiver which deletes SQS messages only after they were successfully stored on S3.

Is it possible to configure this receiver with checkpointing and write ahead logs to achieve this?

Details: http://stackoverflow.com/questions/30809975/reliable-sqs-receiver-for-spark-streaming

Thanks!

imapi commented 9 years ago

Hi Michal!

Actually I've started working on this feature in the past, but it is not ready. Will proceed with this when I would have some free time (hopefully this week).

cizmazia commented 9 years ago

Thanks for your response. With write-ahead logs and checkpointing enabled, the store(multiple-records) call blocks until the given records have been written to write-ahead logs. However your code currently uses store(single-record).

b3nbk1m70 commented 8 years ago

Has there been any progress on this? It would be nice to have.

cizmazia commented 8 years ago

For my use case, it was less complex to delete SQS messages after processing than struggling with Spark check-pointing and graceful shutdown.

Sazpaimon commented 7 years ago

@cizmazia Could you elaborate on what your workflow looks like? I tried modifying the receiver to not delete messages immediately, but since the receiver only sends the message body I can't actually delete the message in a foreachRDD, and I run into serialization hell if I try to have the receiver send the entire Message object.