matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.79k stars 2.13k forks source link

v1.15 upgrade broken ipv6 replication config (ipv6 literals) #7695

Open grinapo opened 4 years ago

grinapo commented 4 years ago

Description

v1.13.x workers config broken after v1.15.0 upgrade. Server dies in mysterious ways.

2020-06-14 23:49:11,767 - synapse.http.client - 283 - INFO - replication-POSITION-27- Sending request GET http://::1:None/_synapse/replication/get_repl_stream_updates/federation/XSBWfTbBUR?from_token=0&upto_token=2
2020-06-14 23:49:11,767 - synapse.http.client - 330 - INFO - replication-POSITION-27- Error sending request to  GET http://::1:None/_synapse/replication/get_repl_stream_updates/federation/XSBWfTbBUR?from_token=0&upto_token=2: URLParseError expected integer for port, not ':1:None'
2020-06-14 23:49:11,768 - synapse.metrics.background_process_metrics - 215 - ERROR - replication-POSITION-27- Background process 'replication-POSITION' threw an exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/hyperlink/_url.py", line 993, in from_text
    port = int(port)
ValueError: invalid literal for int() with base 10: ':1:None'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/synapse/metrics/background_process_metrics.py", line 213, in run
    return (yield result)
hyperlink._url.URLParseError: expected integer for port, not ':1:None'

So far it has been working as:

worker_app: synapse.app.synchrotron
worker_replication_host: '::1'
worker_replication_port: 9092

Apart from it may have require worker_replication_http_port now (not sure though) the …_host is completely broken. I can try various valid combos without success, like

worker_replication_host: '[::1]'

but then it stupidly tries to interpret it as a hostname. I did not find a working solution, so I had to downgrade all connections to ipv4.

Steps to reproduce

Version information

clokep commented 4 years ago

Thanks for the bug report! I was able to reproduce this in sytest by running in worker mode with a command like:

docker run --rm -it -e POSTGRES=true -e WORKERS=true -v /Users/clokep/matrix/synapse\:/src:ro -v /Users/clokep/matrix/sytest/logs\:/logs -v /Users/clokep/matrix/sytest\:/sytest:ro matrixdotorg/sytest-synapse:py35 tests/10apidoc/35room-typing.pl

After modifying sytests to bind to ::1:

Details ```diff diff --git a/lib/SyTest/Homeserver/Synapse.pm b/lib/SyTest/Homeserver/Synapse.pm --- a/lib/SyTest/Homeserver/Synapse.pm +++ b/lib/SyTest/Homeserver/Synapse.pm @@ -719,7 +719,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.pusher", "worker_pid_file" => "$hsdir/pusher.pid", "worker_log_config" => $self->configure_logger("pusher"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_listeners" => [ @@ -745,7 +745,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.appservice", "worker_pid_file" => "$hsdir/appservice.pid", "worker_log_config" => $self->configure_logger("appservice"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_listeners" => [ @@ -771,7 +771,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.federation_sender", "worker_pid_file" => "$hsdir/federation_sender.pid", "worker_log_config" => $self->configure_logger("federation_sender"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_listeners" => [ @@ -797,7 +797,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.synchrotron", "worker_pid_file" => "$hsdir/synchrotron.pid", "worker_log_config" => $self->configure_logger("synchrotron"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_listeners" => [ @@ -831,7 +831,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.federation_reader", "worker_pid_file" => "$hsdir/federation_reader.pid", "worker_log_config" => $self->configure_logger("federation_reader"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_listeners" => [ @@ -865,7 +865,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.media_repository", "worker_pid_file" => "$hsdir/media_repository.pid", "worker_log_config" => $self->configure_logger("media_repository"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_listeners" => [ @@ -933,7 +933,7 @@ sub wrap_synapse_command "worker_app" => "synapse.app.user_dir", "worker_pid_file" => "$hsdir/user_dir.pid", "worker_log_config" => $self->configure_logger("user_dir"), - "worker_replication_host" => "$bind_host", + "worker_replication_host" => "::1", "worker_replication_port" => $self->{ports}{synapse_replication_tcp}, "worker_replication_http_port" => $self->{ports}{synapse_unsecure}, "worker_listeners" => [ ```

Although I tried to bisect what broke this and now I'm not able to reproduce it anymore...

clokep commented 4 years ago

Looks like this got broken in v1.14.0 due to #7517, was able to reproduce with the following:

docker run --rm -it -e POSTGRES=true -e WORKERS=true -v /Users/clokep/matrix/synapse\:/src:ro -v /Users/clokep/matrix/sytest/logs\:/logs -v /Users/clokep/matrix/sytest\:/sytest:ro matrixdotorg/sytest-synapse:py35 tests/10apidoc/12device_management.pl
clokep commented 4 years ago

From #synapse-dev:matrix.org:

I guess this has actually always been broken, just that likely their set up didn't use replication http pokes before we changed the replication protocol so that we requested missing updates via http rather than in band on the tcp connection https://github.com/matrix-org/synapse/blob/master/synapse/replication/http/_base.py#L191 is the offending line

clokep commented 4 years ago

I think this is python-hyper/hyperlink#68, which is also the cause of #4092 it seems?

erikjohnston commented 4 years ago

Maybe, though I'll note that we don't correctly construct the URL in the first place as we don't enclose IPv6 literals in [..]. Having done that we may still run into the issue linked

clokep commented 4 years ago

@erikjohnston I attempted putting it directly in the config (as @grinapo suggested in the description) and then ran into the hyperlink issue.

I also tried making the URL that we construct a byte string instead (as that goes through a slightly different code path). I'm not convinced that really made anything different though.

clokep commented 4 years ago

I think #4478 is a duplicate of this, although that has a workaround:

A workaround is using ip6-localhost in the URL.

AluisioASG commented 3 years ago

I see a similar issue to #4478 when configuring an appservice with an IPv6 literal address in the url registration field. Using ip6-localhost instead works.

synapse.appservice.api: [as-sender-494a742717f3d068f2e73680da4c2366d8d489df6ce56a050e85fed8c599d097-0] push_bulk to http://[::1]:54554/transactions/1 threw exception Codepoint U+003A at position 1 of '::1' not allowed