hansetag / iceberg-catalog

A Rust implementation of the Iceberg REST Catalog specification.
Apache License 2.0
144 stars 9 forks source link

Support PubSub for Kafka #271

Open c-thiel opened 4 weeks ago

Grongrilla commented 4 weeks ago

I would like to give this one a shot

Grongrilla commented 4 weeks ago

Turns out there is something to discuss right from the get go.

I kind of expected that there would be one clear choice for a kafka rust lib, but at least at first glance, there is not.

There seem to be two more or less mature implementations available:

kafka-rust

Seems to be a pure rust implementation. A first look at the examples shows that it seems to be pretty easy to use. There a few things to mention, though

rust-rdkafka

... is actually "just" a safe interface to librdkafka

what next?

I am not sure what is the best choice here. If introducing a c lib is not an option, rust-kafke seems to be the only choice. If it is ok to schlepp around a c lib, rust-rdkafka is also async and seems to be "endorsed" by Cloud Events SDK.

Or maybe I am overlooking "that other kafka rust lib", that has less downsides than kafka-rust or rust-rdkafke :smile:

twuebi commented 3 weeks ago

I went looking and found this, it's rather new but looks promising?

https://www.reddit.com/r/rust/comments/1ehpjgh/rust_native_kafka_protocol_and_client/ https://github.com/CallistoLabsNYC/samsa

twuebi commented 3 weeks ago

In terms of maturity & user-base it probably makes sense to stick to rdkafka for now, eventually we should switch over to a rust-native implementation to get rid of the C dependency.

Grongrilla commented 3 weeks ago

@twuebi

samsa indeed looks promising, but from your second comment I gather: rdkafka it is, for now.

Three questions:

I'd probably vote

twuebi commented 3 weeks ago

I'd say let's give rdkafka a try then, we should probably depend on cloudevents sdk's packaged rdkafka, from a cursory read, it seems that serialization of cloudevents to kafka is a bit more involved than what we do for nats, compare cloudevents-sdk-0.7.0/src/binding/nats/serializer.rs:19 with cloudevents-sdk-0.7.0/src/binding/rdkafka/kafka_producer_record.rs:24.

We've gone for depending on async-nats directly since cloudevents didn't package async-nats IIRC.

twuebi commented 3 weeks ago

Existing publishers can be found in crates/iceberg-catalog/src/service/event_publisher.rs:166..

Grongrilla commented 3 weeks ago

@c-thiel @twuebi

I just realized, that the latest release of cloudevents sdk depends on rdkafka ^0.29. Current release is 0.36.2.

0.29 is almost 2 years old. It depends on librdkafka 1.9, which is also almost 2 years old. Current version of librdkafka is 2.5.

The main branch of cloudevents sdk is already on ^0.36

Tbh, I am not sure what would be a good way to solve this 🙈

twuebi commented 3 weeks ago

Hm, unfortunate, I'd say either ask CloudEvents-sdk for a release or vendor their serialization code for rdkafka

Grongrilla commented 2 weeks ago

If venodring is an option, I will do that. I can continue (well, start...) working and should also make things easier if or when cloud events sdk release a new version.

Regarding asking cloud events sdk for a release: maybe something you could or should do @twuebi? I'd maybe feel a bit uncomfortable since this is not my codebase 😅

twuebi commented 2 weeks ago

then let's start with vendoring