karafka / rdkafka-ruby

Modern and performant Kafka client library for Ruby based on librdkafka
https://karafka.io
Other
345 stars 120 forks source link

Allow the OAuth refresh code to work with Ruby 2.7? #478

Closed cobbr2 closed 1 month ago

cobbr2 commented 1 month ago

This is tied to https://github.com/bruce-szalwinski-he/aws-msk-iam-sasl-signer-ruby/issues/21 . My team has a Ruby 2.7 application that is difficult to port up to Ruby 3... but we are moving our Kafka clusters to MSK Serverless, and so need to use aws-msk-iam-sasl-signer-ruby.

We therefore wonder whether the 0.16 code absolutely requires Ruby 3, or if it would be terribly difficult to create a Ruby 2.7 implementation. We can apply some resources, but we'd like to know if there is something that would make this backport infeasible or very hard; we need to compare that effort with the effort necessary to upgrade the application (which is scheduled for replacement, but not until 2025).

Thanks!

mensfeld commented 1 month ago

Hey,

This is beyond my OSS support. Let me quote my support docs:

Consideration for Backporting Features or Fixes: We understand that certain features or fixes can significantly enhance the functionality and stability of your current projects. As a Karafka Pro user, you can request a backport of a particular feature or fix to support your specific needs. Please note that the possibility of backporting will be assessed on a case-by-case basis, taking into account factors like the feasibility of the backport and its impact on system stability.

I do not know how big of an effort it would be to bring to rdkafka; however, please keep in mind that it would also have to be brought to Karafka and Waterdrop. You mentioned you are using Karafka, and this integration is not a single-layer one. A few things were required for all the components to work. On top of that, the event model of Karafka 2.2.6 is not sufficient for several cases when the oauth token has to be refreshed. Those changes were shipped in 2.2.12:

[Enhancement] Rewrite the polling engine to update statistics and error callbacks despite longer non LRJ processing or long max_wait_time setups. This change provides stability to the statistics and background error emitting making them time-reliable.

Without this and few more changes in between 2.2.12 and 2.4.0 this will not work as expected.

Given this, even if this would be brought to rdkafka, I am not willing to bring this to Karafka 2.2 or 2.3 because this would basically mean backporting extremely critical event handling operations which would require weeks of my work to ensure it is stable and reliable. This is beyond even what I am willing to do for money.

mensfeld commented 1 month ago

Just so you know, the fact that you can run a single oauth refresh is not equal to the fact that you can sustain the operations given many edge cases of running processing AND handling the refresh events in parallel. The decision to retire support for Ruby 2.7 follows my announced schedule: https://karafka.io/docs/Versions-Lifecycle-and-EOL/#ruby-versions-support

Ruby 3.0 is next and will be dropped at the end of September. In order for me to move the ecosystem, I have to drop legacy Ruby versions that do not have things I need to move things. For example fiber-level storage and other low-level things.

I would not recommend trying to bring Oauth to Ruby 2.7 given the fact it is not only rdkafka. It took me a fairly long time with @bruce-szalwinski-he to make it work reliably and tbh I do not think it is worth the effort.

cobbr2 commented 1 month ago

Thank you for your well-reasoned and thorough (though disappointing) reply. And yes, we're also using Waterdrop (also an ancient version).

cobbr2 commented 1 month ago

Oh, and I'll add that we would certainly be willing to add a coder of our own, but I'm as aware of Brooks' law as you are; it's not clear that would speed the process up.

mensfeld commented 1 month ago

@cobbr2 sorry to disappoint you.

add that we would certainly be willing to add a coder of our own,

If you aim to use Karafka for a long time, there are already better ways to support this ecosystem. While I will not backport this, supporting me ensures this ecosystem moves forward and receives many new things.

As I mentioned, this touches the fundamental synchronization layer of Karafka when post-init oauth tokens refreshes are needed. On top of that many critical components had smaller or bigger changes to make it work in the most recent (at that time) Karafka version. This took me (who knows karafka and rdkafka pretty well) and Bruce (who knows Oauth pretty well) at least 6-8 weeks if I recall correctly, and a solid round of testing.

Getting a dev to "just do it" will not differ much from burning money. I will not incorporate such changes into older Karafka ecosystem components versions because I spent a lot of time writing specs and making sure things work in a stable fashion. Even if I would be given PRs with this feature, it would still require me to backport many integration specs and other component tests. This is not something I am willing to do. I maintain more than 1 000 integration specs on top of unit tests and run many end-to-end integrations including those against MSK and Confluent to make sure things work. Such backport would require me to spend at least 60-80 hours to be 100% sure it works. This is a lot of time that I hope to use to improve the ecosystem in other ways.

cobbr2 commented 1 month ago

Understandable. Thank you.