whatyouhide / xandra

Fast, simple, and robust Cassandra/ScyllaDB driver for Elixir.
ISC License
406 stars 54 forks source link

Fix IPv6 support in `Xandra.Cluster` #328

Closed jacquelineIO closed 1 year ago

jacquelineIO commented 1 year ago
  1. Fix node validation errors
  2. use rpc_address when available

Xandra will now use tuples for IP addresses instead of strings, which are valid Erlang types. And the tuples are able to run through node validations.

History of why we started working on these changes:

Started this work on 0.16 of Xandra with this PR. Even with these changes we still encountered problems with deploying and Karl opened up an issue with Xandra.

New changes were released for Xandra and reapplying the old PR changes to v0.17 of Xandra.

We are using this lib and the host structs are returning ipv6 addresses for our clusters. The validate_node function is failing on ipv6 addresses

"The pool supervisor will run the validations again on the ipv6 values returned in https://github.com/lexhide/xandra/blob/v0.16.0/lib/xandra/cluster.ex#L793 and in the logs we see that our hosts are returned with both ipv4 and ipv6 addresses. The ipv6 addresses are being used and fail to validate." - @britto

invalid value for list element at position 0: invalid node: \"xxxx:xxxx:xxxx:xxxx:0:xxx:xxxx:xxxx:xxxxx\"", value: ["xxxx:xxxx:xxxx:xxxx:0:xxx:xxxx:xxxx:xxxxx"]}, [{NimbleOptions, :validate!, 2, [file: 'lib/nimble_options.ex', line: 343]}
     (xandra 0.16.0) lib/xandra/cluster.ex:750: Xandra.Cluster.start_pool/2
     (xandra 0.16.0) lib/xandra/cluster.ex:798: anonymous fn/4 in Xandra.Cluster.maybe_start_pools/1
     (elixir 1.12.3) lib/enum.ex:4286: Enumerable.List.reduce/3
     (elixir 1.12.3) lib/enum.ex:2431: Enum.reduce_while/3
     (xandra 0.16.0) lib/xandra/cluster.ex:726: Xandra.Cluster.handle_info/2
     (stdlib 3.17.2) gen_server.erl:695: :gen_server.try_dispatch/4
     (stdlib 3.17.2) gen_server.erl:771: :gen_server.handle_msg/6

These changes are tagged v0.17.x-community.1.

relistan commented 1 year ago

We don't just need to validate the IPv6 addresses. As shown here, we need to actually be able to use the RPC addresses because the addresses shown in peer are not routable.

relistan commented 1 year ago

@whatyouhide those test failures seem to have occurred with ScyllaDB. I am suspicious about that. Is there a way to get it to re-run these without committing?

whatyouhide commented 1 year ago

@relistan yep I can re-run the CI suite any time, if you can't (top-right on GH Actions) because you don't have commit access to the repo then just ping me and I'll do it for ya πŸ™ƒ

relistan commented 1 year ago

I can't trigger a rebuild. But it looks like @jacquelineIO needs to run mix format so that will trigger a build when she pushes.

jacquelineIO commented 1 year ago

So these test runs feel sporadic, I was running the formatter and it was down to 1 build failure (not counting the linter) now it's up to 4. And previously I had none before taking out the option to use rpc_address. So I'm not sure what to do here.

whatyouhide commented 1 year ago

We have some flaky tests (check the issue tracker), so I would suggest to not worry about tests too much for now, let's get the changes in and when we're ready I can take a look at the failures and merge even if they're there, no worries πŸ™ƒ

mattias01 commented 1 year ago

The change to use rpc_address instead of peer is a fix we need to be able to use a newer Xandra version, which forces the use of auto discovery. I've created a fork where I change just that, but I saw this PR so I will not create a PR for it myself. Thanks for your work.

whatyouhide commented 1 year ago

Thanks @jacquelineIO! πŸ’Ÿ