Closed antoinetran closed 6 years ago
@antoinetran Thanks for the interest. Please se the answers below:
Generally speaking, Kafka is not supposed to be used as a blob store. Its underlying data structure is not a perfect fit for that. The fact that all the messages are large messages indicates that some blob store is probably a better choice. The suggestion is generally true.
LiKafkaConsumer keeps track of the "safe offset" for each partition. In your example after M1Sn is consumed but before M2Sn is consumed, if offsets are committed, the safe offset will be M2S1, so the consumer will restart consumption at M2S1 next time it comes up.
You can check the following slides for more details. https://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297
Hi,
First I would like to thank you for open-sourcing the project. We have the same needs and we implemented a similar (yet less advanced) solution. I have some questions that might be added in README:
Could you explain why the reference based messaging is recommended in such a case? 2.
How do you handle this scenario: there is one partition and 2 producers writing to the same topic in parallel. The large messages M1 et M2 will have multiple segments all mixed. Let's say we have the start of M1 (called M1S1) before M2, and the end of M1 (M1Sn) between the start of M2 (M2S1) et the end of it (M2Sn): M1S1 ---- M2S1 ---- M1Sn --- M2Sn Does acknowledging M1 will result in acknowledging all offsets between M1S1 and M1Sn, including some of M2 segments? Let's say the consumer crash after eating M1, but before M2. Can Kafka restart at the beginning of M2 instead of the end of M1?