redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.68k stars 589 forks source link

Can't host redpanda on fly.io - IPV6 only network #5842

Open jeffdeville opened 2 years ago

jeffdeville commented 2 years ago

Version & Environment

Redpanda version: (use rpk version): Server: docker.redpanda.com/vectorized/redpanda:v22.1.6 Client: v22.1.5 (rev 042089c50e0c5d148a2d49f5dcf1bcdfa419be3a) Cloud: https://fly.io/ Client OS: MacOS Monterey

I've created a fly.toml to run redpanda that looks like this:

app = "redpanda-1"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[build]
  image = "docker.redpanda.com/vectorized/redpanda:v22.1.6"

[mounts]
  destination = "/var/lib/redpanda/data"
  source = "redpanda_poc"

[processes]
  redpanda = "redpanda start --overprovisioned --smp 1 --memory 1G --reserve-memory 0 --node-id 0 --check=false --kafka-addr FLY://[::1]:9092 --advertise-kafka-addr FLY://redpanda-1.internal:9092"

I then created a persistent volume with: fly volumes create redpanda_poc --size 1

And then deployed: fly deploy --app redpanda-1

What went wrong?

rpk topic list --brokers "redpanda-1.internal:9092" -vvv [DEBUG] opening connection to broker; addr: redpanda-1.internal:9092, broker: seed 0 [WARN] unable to open connection to broker; addr: redpanda-1.internal:9092, broker: seed 0, err: dial tcp [fdaa:0:4939:a7b:ab2:1:4e05:2]:9092: connect: connection refused unable to request metadata: unable to dial: dial tcp [fdaa:0:4939:a7b:ab2:1:4e05:2]:9092: connect: connection refused

What should have happened instead?

Should have seen a list of products

How to reproduce the issue?

If this is an issue with IPV6, then presumably it could be replicated with Docker, but Docker only supports IPV6 on linux, and I'm on a mac. So the only way I know to emulate it is by creating an account and deploying to fly.io

  1. Set up their fly cli https://fly.io/docs/getting-started/installing-flyctl/
  2. Login / Register https://fly.io/docs/getting-started/log-in-to-fly/
  3. fly volumes create redpanda_poc --size 1
  4. create a fly.toml file with the content above (see above)
  5. fly deploy --app redpanda-1
  6. Set up wireguard connection in fly https://fly.io/docs/reference/private-networking/#install-your-wireguard-app
  7. Set up wireguard locally: https://fly.io/docs/reference/private-networking/#importing-your-tunnel
  8. Connect Wireguard to your tunnel
  9. rpk topic list --brokers "redpanda-1.internal:9092" -vvv

Additional information

Note that if you connect directly to the machine, it will work:

  1. fly ssh console -a redpanda-1
  2. rpk topic list --brokers "[::1]:9092" This command does work

I'm engaging the Fly team to try and help me diagnose the issue as well https://community.fly.io/t/redpanda-kafka-clone-cant-connect/6221

Please attach any relevant logs, backtraces, or metric charts.

JIRA Link: CORE-991

djnalluri commented 2 years ago

I believe ::1 is the loopback address for IPv6, equivalent to 127.0.0.1 in IPv4. :: should be the equivalent of 0.0.0.0.

rupurt commented 2 years ago

I have been able to get a single node broker working on Fly.io by using the socket address [::] and the advertised address of application-name-broker-1.internal. I was also able to connect redpanda console to this node.

However, I have not been able to get any other brokers to join the cluster. It fails on the join request. After debugging for several days I haven't been able to resolve it :{ It seems like the RPC listener is not working for IPv6 given that redpanda console works fine.

2022-10-22T22:04:50.600 app[d9475034] sea [info] INFO 2022-10-22 22:04:50,599 [shard 0] cluster - members_manager.cc:388 - Sending join request to {host: application-name-broker-1.internal, port: 33145}
2022-10-22T22:04:55.595 app[d9475034] sea [info] WARN 2022-10-22 22:04:55,595 [shard 0] cluster - config_manager.cc:135 - Exception during bootstrap: seastar::timed_out_error (timedout)
tobiaslins commented 1 year ago

@rupurt Have you solved this somehow?

rupurt commented 1 year ago

@tobiaslins no unfortunately. I was in contact with Redpanda support but they couldn't get to the bottom of it. I gave up...

I'm still fairly confident that there is a strong lead on the RPC listener not bound to IPv6. rpk may also have a bug where it doesn't resolve IPv6 making it harder to debug the root of the problem.

ssh into broker-1 trying to list topics from broker-2 doesn't work

➜  atlas-core-redpanda git:(main) ✗ fly ssh console -a tokenalysis-development-core-redpanda-broker-1
Update available 0.0.417 -> v0.0.418.
Run "fly version update" to upgrade.
Connecting to fdaa:0:72cf:a7b:2c60:2:754e:2... complete
# rpk topic list --brokers tokenalysis-development-core-redpanda-broker-2.internal:9092
unable to request metadata: unable to dial: dial tcp [fdaa:0:72cf:a7b:2c60:2:75a3:2]:9092: connect: connection refused

ssh into broker-2 listing topics from broker-1 does work

➜  atlas-core-redpanda git:(main) ✗ fly ssh console -a tokenalysis-development-core-redpanda-broker-2
Update available 0.0.417 -> v0.0.418.
Run "fly version update" to upgrade.
Connecting to fdaa:0:72cf:a7b:2c60:2:75a3:2... complete
# rpk topic list
unable to request metadata: unable to dial: dial tcp 0.0.0.0:9092: connect: connection refused
# rpk topic list --brokers tokenalysis-development-core-redpanda-broker-1.internal:9092
NAME  PARTITIONS  REPLICAS
execc commented 1 year ago

Any progress with this? Run into the same issue :(

vovayartsev commented 9 months ago

It appeared from FLY.IO docs that fly-local-6pn is an alias for the IP v6 address of the app.

For a service to be accessible via its 6PN address, it needs to bind to/listen on fly-local-6pn. For example, if you have a service running on port 8080, you need to bind it to fly-local-6pn:8080 for it to be accessible at “[6PN_Address:8080]”.

So I was able to run Redpanda on FLY.IO using the below fly.toml:

# fly.toml app configuration file generated for secondhand on 2024-02-04T23:32:33+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = 'YOURAPP'
primary_region = 'lhr'

[build]
  image = "docker.redpanda.com/redpandadata/redpanda:latest"

[processes] 
  panda = "redpanda start --smp '1' --memory 512M --reserve-memory 0M --kafka-addr FLY://fly-local-6pn:9092 --advertise-kafka-addr FLY://YOURAPP.internal:9092 --pandaproxy-addr FLY://fly-local-6pn:8082 --advertise-pandaproxy-addr FLY://YOURAPP.internal:8082"

[env]
  REDPANDA_BROKERS = "YOURAPP.internal:9092"

[[vm]]
  cpu_kind = 'shared'
  cpus = 1
  memory_mb = 1024

[mounts]
  source="redpanda_data"
  destination="/var/lib/redpanda/data"

Replace YOURAPP with the actual name of your FLY.IO application.

Please keep in mind that this is a 1-node dev-only configuration, and it's probably dangerous to run the real production workloads with it.

sundbry commented 2 months ago

I believe the failure is due to AAAA DNS lookups on the IPv6 advertised address. When I use an AAAA host name for my advertise addresses, no nodes are able to join the cluster as you reported. When I use a static ipv6 address for the advertise address, it works fine and is able to form a healthy cluster with a raft quorum.