lexmag / statix

Fast and reliable Elixir client for StatsD-compatible servers
https://hexdocs.pm/statix
ISC License
275 stars 82 forks source link

Support elixir 1.15 and OTP 26 #72

Open juanazam opened 1 year ago

juanazam commented 1 year ago

Closes #69

Update to Elixir 1.15.2 and OTP 26.0.2 The main issue here is that Port.comand is erroring out because of a behavior change on OTP that's not going to be updated.

Related issue on Statix: https://github.com/lexmag/statix/issues/69 Related issue on top: https://github.com/erlang/otp/issues/7130

In summary, Statix was using Port.command to avoid a performance issue with using :gen_udp.send. This approach no longer works because of OTP updates happening at the networking layer.

In order to avoid this, this PR uses gen_udp.send (which has been optimized), but with this change, a lot of the existing code is no longer useful.

Port.command expects a packet containing the IP and port of the destination to which the packet will be sent. This is no longer required with the new approach, but we need to keep the address and port on the Conn module to be able to do the :gen_udp.send calls to send Statds servers.

This patch has been tested in production, see: https://github.com/knocklabs/statix/pull/1

Billzabob commented 1 year ago

Any updates here? How do we get this merged? Anxiously awaiting this since it's the last thing stopping use from going to OTP 26.

juanazam commented 1 year ago

@Billzabob I'm not sure to be 100% honest.

@lexmag Sorry to ping you directly (I know you haven't been active on this project for a bit), but is there any chance we can get this reviewed, merged, and released?

ckoch-cars commented 1 year ago

@lexmag Bump

aselder commented 1 year ago

If @lexmag has abandoned this project, does anybody want to take it over, either by getting the repo transferred or creating a canonical fork?

andyleclair commented 1 year ago

Any updates on this @lexmag? This is a blocker for us moving to OTP 26 as well

dhaspden commented 1 year ago

This is blocking us. How can we get this published or migrated to somebody who will publish?

Thanks

carrascoacd commented 12 months ago

Since this is blocking many of us, the company I'm working on, Cabify, could try to take the temporal ownership of this repo under this fork: https://github.com/cabify/statix. We've been running this solution in production for some weeks with success.

If you are ok with this, please, react to this message and open the same PR in the fork and we will be glad to merge it and publish the library under a temporal name cabify_statix.

We want to make explicit that, in case the owner is back, we will just get rid of the fork and sync it with this repo. So if you agree, let's continue and adopt OTP 26! If you are not ok with this proposal, just let us know also 😁

juanazam commented 12 months ago

@carrascoacd Thanks for testing the fix and taking ownership of the package!

carrascoacd commented 12 months ago

Before continuing, please @whatyouhide since you appear as the maintainer of the library. Could you share your thoughts?

juanazam commented 11 months ago

Update here, I have reached Andrea and he has not been involved with the project for a while, he suggested reaching for @lexmag, now I'm trying to get a hold of him.

lexmag commented 11 months ago

Hi folks, I'm sorry to have kept you waiting on this PR.

@juanazam thank you again for reaching out.

First of all, I must say that Statix does not indeed get a high priority among all activities I have to address, but I still do want it to be fixed and improved, thus, during next two weeks I'm going to merge this PR, look into other issues, and level the project up in general.

That's said, if anyone is willing to push the work beyond what I'm able to dedicate this time, nothing stops you from doing that considering the license is extremely permissive.

As for this PR, I first need to read up on what specifically has changed in OPT 26. From a quick glance, the PR unfortunately cannot be merged in its current shape as it drastically affects the performance of OTP pre-26. We need to have a conditional compilation to preserve the previous approach for older OTP versions.

juanazam commented 11 months ago

Hi @lexmag,

Thanks for taking a look at this. Regarding older versions of OTP, I didn't want to spend a lot of time on it without knowing what's the plan on maintenance for them. I saw some checks for OTP 18 compatibility, which was released in June 2015. Maybe it's work considering doing a major release and remove support for older OTP versions to keep the code simpler (this is just an idea).

If you know what's the oldest version you would want to support and have ideas on how to support it, I can help with updating this PR.

imsoulfly commented 10 months ago

Hey there, kindly asking what the ETA on a release of this might be? 😃

juanazam commented 9 months ago

Hey @lexmag do you need any help with this?

mtrudel commented 9 months ago

From a quick glance, the PR unfortunately cannot be merged in its current shape as it drastically affects the performance of OTP pre-26. We need to have a conditional compilation to preserve the previous approach for older OTP versions.

Does anyone know specifically what @lexmag is referring to here? I don't immediately see anything in this change that sets up alarm bells performance wise...

juanazam commented 8 months ago

@mtrudel I can give some context there. It seems that using :gen_udp.send instead of Port.command was less performant on older versions of OTP.

From the related OTP thread https://github.com/erlang/otp/issues/7130#issuecomment-1513005012

In epgsql direct port call was implemented (the idea was borrowed from RabbitMQ) because epgsql tends to accumulate a large message queue (due to active=true by default). And pre-OTP26 gen_tcp:send had a performance issues when the process calling it has a long message queue.

mtrudel commented 8 months ago

@mtrudel I can give some context there. It seems that using :gen_udp.send instead of Port.command was less performant on older versions of OTP.

Thanks for the context @juanazam!

Unless I'm reading this wrong, the way statix currently uses Port.command nets to the exact same thing that RabbitMQ's hack seeks to avoid. To wit:

With this in hand, let's look at how statix does this today:

https://github.com/lexmag/statix/blob/cc3f93b73a24c585e5a8a55c987c8b845da40270/lib/statix/conn.ex#L50-L62

Again, unless I'm misunderstanding this, statix's implementation here amounts to an 'adhoc implementation of half of gen_tcp.send' (with apologies to Robert Virdig). There's no point in trying to worry about preserving what's here for older OTPs because it's been slow this whole time.

What am I missing here?

juanazam commented 8 months ago

@mtrudel Everything you mention here makes sense! I tried digging up discussions on old issues but couldn't find anything meaningful. The Port.command(sock, packet) call was added as part of the initial commit, so it has always been there.

I remember seeing a discussion about this somewhere, but I can't find it (this was months ago so my memory is failing me), sorry!

carrascoacd commented 8 months ago

If the performance problems only apply to older versions, we could add a conditional compilation as suggested, ie OTP < 26. It should work. It should unblock this while we find how to improve it.

juanazam commented 8 months ago

Decided to implement backwards compat as @carrascoacd mentioned to make this even less of a concern. It's not the cleanest implementation but IMO this path will be eventually removed, so no need to keep it too fancy.

@mtrudel while implementing it, I confirmed one thing! I had to keep this

https://github.com/lexmag/statix/blob/cc3f93b73a24c585e5a8a55c987c8b845da40270/lib/statix/conn.ex#L57-L61

Because I was getting this error while running the tests:

Unexpectedly received message {:inet_reply, #Port<0.13>, :ok} (which matched _any)

I think that back then I read that Port.command immediately replied to the caller process with this type of message, but I can't seem to find the thread were I read that :(.

Anyway I hope this makes it easier for @lexmag to get it merged

juanazam commented 8 months ago

One more finding on why this snippet was needed:

https://github.com/lexmag/statix/blob/cc3f93b73a24c585e5a8a55c987c8b845da40270/lib/statix/conn.ex#L57-L61

It seems it was happening because of an internal detail on the prim_inet implementation that was removed at the same time as the braking change was added.

https://github.com/erlang/otp/commit/f17f8024a41e8b6fd2a8cb5be765d6e714e3e8f9#diff-86e7e31a3a23b92ea754ad259d8ffd6eb347deb522218fae1803878bf9d15e8bL572

There is a flow where the process running Port.command receives a reply, and that's why the error happens and the receive block is needed.

mtrudel commented 8 months ago

There is a flow where the process running Port.command receives a reply, and that's why the error happens and the receive block is needed.

Yes, but my point stands: this implementation as it exists today is just an adhoc copy of what :gen_udp.send does; we're not actually saving any effort here. The whole point of the RabbitMQ hack is to avoid a receive; if we put it back then we're no better off than we are just calling :gen_udp.send and doing it the right way.

carrascoacd commented 7 months ago

I think we are ready to merge, right? cc/ @lexmag

andyleclair commented 2 months ago

Hey all, any movement here?

nicolkill commented 1 month ago

same here, some fix?

im using docker

FROM hexpm/elixir:1.16.3-erlang-26.2.5.2-ubuntu-jammy-20240808
JasterV commented 2 weeks ago

Why is this still in progress????