kamax-matrix / mxisd

Federated Matrix Identity Server
GNU Affero General Public License v3.0
220 stars 112 forks source link

Failure to post 3PID bind to synapse #51

Closed maxidorius closed 6 years ago

maxidorius commented 6 years ago

Using a SRV record that points to another hostname than the configured server_name, publishing a 3PID mapping after sending a 3PID room invitation fails with:

Posting onBind event to https://matrix.domain.tld:8448/_matrix/federation/v1/3pid/onbind
Answer body: {"errcode":"M_UNKNOWN","error":"Third party certificate could not be checked"

synapse version: 0.26 synapse install: .deb

kepi commented 6 years ago

Here is more info:

server_name is set to domain.tld in synapse, same as matrix.domain

I have also this in mxisd settings:

dns.overwrite.homeserver.client:
  - name: 'domain.tld'
    value: 'http://localhost:8008'
  - name: 'matrix.domain.tld'
    value: 'http://localhost:8008'

log message

INFO [onPool-worker-1]  i.k.m.invitation.InvitationManager : Found mapping for pending invite @user:domain.tld:!aBcdEfgHijkLmnoPqRsTuv:domain.tld:email:gmailuser@gmail.com
INFO [onPool-worker-1]  i.k.m.invitation.InvitationManager : Discovering HS for domain domain.tld
INFO [onPool-worker-1]  i.k.m.invitation.InvitationManager : Lookup name: _matrix._tcp.domain.tld
INFO [onPool-worker-1]  i.k.m.invitation.InvitationManager : Found SRV record: _matrix._tcp.domain.tld.        300        IN        SRV        10 0 8448 matrix.domain.tld.
INFO [      Thread-88]  i.k.m.invitation.InvitationManager : Posting onBind event to https://matrix.domain.tld:8448/_matrix/federation/v1/3pid/onbind
INFO [      Thread-88]  i.k.m.invitation.InvitationManager : Answer code: 502
WARN [      Thread-88]  i.k.m.invitation.InvitationManager : Answer body: {"errcode":"M_UNKNOWN","error":"Third party certificate could not be checked"}
maxidorius commented 6 years ago

Couldn't replicate locally.

I think it is related to the fact that synapse cannot validate mxisd's signing key because trying to reach the public URL of mxisd (https://domain.tld/_matrix/identity/... ends up on a wrong host or an invalid certificate is given. Checking synapse's code doesn't give me a definitive answer.

I would look at how your network and/or reverse proxy are setup and ensure that reaching https://domain.tld/_matrix/identity/status from the HS box doesn't fail in any way, like with a curl command. you should get back the following JSON if successful:

{"status":{"health":"OK"}}
maxidorius commented 6 years ago

@kepi any update?

kepi commented 6 years ago

Sorry, I didn't find time untill now.

$ curl https://matrix.domain.tld/_matrix/identity/status
{"status":{"health":"OK"}}#

this is working as supposed. Problem seams in the fact that onBind event is posted to https://matrix.domain.tld:8448/ instead of http://localhost:8008 as instructed in dns overwrite? I'm more than little lost in how these interconnecting, but there is definitely wrong certificate on https://matrix.domain.tld:8448/ because it is using self-signed cert as recommended.

as https proxy we are using haproxy, relevant part of config:

frontend web-ssl
  bind :::443 ...

  ...

  use_backend bk-mxisd if { path -m reg ^/_matrix/client/r0/user_directory }
  use_backend bk-mxisd if { path -m reg ^/_matrix/identity }
  default_backend bk-synapse

backend bk-synapse
  server s1 127.0.0.1:8008 check
  mode http
backend bk-mxisd
  server s1 127.0.0.1:8090 check
  mode http

I'm attaching full configurations which we are using now in case I miss something:

mxisd.yaml:

matrix.domain: 'domain.tld'
logging.level.io.kamax.mxisd.backend.ldap: 'DEBUG'
key.path: '/var/lib/mxisd/signing.key'
storage.provider.sqlite.database: '/var/lib/mxisd/mxisd.db'

ldap:
  enabled: true
  connection:
    host: 'ldap.domain.tld'
    bindDn: 'cn=matrix-identity,ou=webapps,dc=domain,dc=tld'
    bindPassword: 'somepassword'
    baseDn: 'ou=org,dc=domain,dc=tld'
    tls: true
    port: 636

  attribute:
    uid:
      type: 'uid'
      value: 'uid'
    name: 'uid'

dns.overwrite.homeserver.client:
  - name: 'domain.tld'
    value: 'http://localhost:8008'
  - name: 'matrix.domain.tld'
    value: 'http://localhost:8008'

lookup:
  recursive:
    allowedCidr:
      - 'ipv4.of.server/32'
      - 'ipv6.of.server/128'
      - '127.0.0.0/8'
      - '::1/128'

homeserver.yaml

tls_certificate_path: "/etc/matrix-synapse/homeserver.tls.crt"
tls_private_key_path: "/etc/matrix-synapse/homeserver.tls.key"
tls_dh_params_path: "/etc/matrix-synapse/homeserver.tls.dh"
no_tls: False
pid_file: "/var/run/matrix-synapse.pid"
web_client: False
public_baseurl: https://domain.tld/
soft_file_limit: 0
listeners:
  - port: 8448
    bind_address: ''
    type: http
    tls: true
    x_forwarded: false
    resources:
      -
        names:
          - client     # The client-server APIs, both v1 and v2
          - webclient  # The bundled webclient.
        compress: true
      - names: [federation]  # Federation APIs
        compress: false
  - port: 8008
    tls: false
    bind_address: ''
    type: http
    x_forwarded: true
    resources:
      - names:
          - client     # The client-server APIs, both v1 and v2
          - webclient  # The bundled webclient.
        compress: false
      - names: [federation]
        compress: false
database:
  name: psycopg2
  args:
    ...
event_cache_size: "10K"
log_config: "/etc/matrix-synapse/log.yaml"
rc_messages_per_second: 0.2
rc_message_burst_count: 10.0
federation_rc_window_size: 1000
federation_rc_sleep_limit: 10
federation_rc_sleep_delay: 500
federation_rc_reject_limit: 50
federation_rc_concurrent: 3
media_store_path: "/var/lib/matrix-synapse/media"
max_upload_size: "10M"
max_image_pixels: "32M"
dynamic_thumbnails: false
thumbnail_sizes:
  - width: 32
    height: 32
    method: crop
  - width: 96
    height: 96
    method: crop
  - width: 320
    height: 240
    method: scale
  - width: 640
    height: 480
    method: scale
  - width: 800
    height: 600
    method: scale
url_preview_enabled: True
url_preview_ip_range_blacklist:
- '127.0.0.0/8'
- '10.0.0.0/8'
- '172.16.0.0/12'
- '192.168.0.0/16'
url_preview_url_blacklist:
  - username: '*'
  - netloc: 'google.com'
  - netloc: '*.google.com'
  - scheme: 'http'
  - netloc: 'www.acme.com'
    path: '/foo'
  - netloc: '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'
max_spider_size: "10M"
recaptcha_public_key: "YOUR_PUBLIC_KEY"
recaptcha_private_key: "YOUR_PRIVATE_KEY"
enable_registration_captcha: False
recaptcha_siteverify_api: "https://www.google.com/recaptcha/api/siteverify"
turn_uris:
  - "turn:turn.domain.tld:3478?transport=udp"
  - "turn:turn.domain.tld:5349?transport=tcp"
turn_shared_secret: "supersecret"
turn_user_lifetime: "1d"
enable_registration: False
registration_shared_secret: anothersupersecret
user_creation_max_duration: 1209600000
bcrypt_rounds: 12
allow_guest_access: False
trusted_third_party_id_servers:
  - matrix.domain.tld
enable_metrics: False
room_invite_state_types:
    - "m.room.join_rules"
    - "m.room.canonical_alias"
    - "m.room.avatar"
    - "m.room.name"
app_service_config_files: []
expire_access_token: False
signing_key_path: "/etc/matrix-synapse/homeserver.signing.key"
old_signing_keys: {}
key_refresh_interval: "1d" # 1 Day.
perspectives:
  servers:
    "matrix.org":
      verify_keys:
        "ed25519:auto":
          key: "verifiykey"
password_config:
   enabled: true
password_providers:
  - module: "rest_auth_provider.RestAuthProvider"
    config:
      endpoint: "http://127.0.0.1:8090"
maxidorius commented 6 years ago

Please ignore my previous reply (which I deleted now), I totally missed the start of your reply with the DNS overwrite.

Change your configuration

dns.overwrite.homeserver.client:
  - name: 'domain.tld'
    value: 'http://localhost:8008'
  - name: 'matrix.domain.tld'
    value: 'http://localhost:8008'

to

dns.overwrite.homeserver.federation:
  - name: 'domain.tld'
    value: 'https://localhost:8448'
  - name: 'matrix.domain.tld'
    value: 'https://localhost:8448'

The DNS overwrite that must be used is a federation one, not a client one and it might be a typo from me from the start. If it is, sorry about that. I clearly need to improve the documentation further!

kepi commented 6 years ago

First to be sure - should I really change dns.overwrite.homeserver.client to dns.overwrite.homeserver.federation or add second section with federation? I had to use homeserver client overwrite before because without it I wasn't able to authenticate.

For now, I just added it as second section. There is definitely progress and onBind is sent to correct address now, but synapse is crashing on it now. I'm not sure if it may be still something in mxisd or it is on synapse's side now.

mxisd log:

úno 09 16:32:16 hattie mxisd[30713]: .117  INFO [onPool-worker-1]  i.k.m.invitation.InvitationManager : Discovering HS for domain domain.tld
úno 09 16:32:16 hattie mxisd[30713]: .117  INFO [onPool-worker-1]  i.k.m.invitation.InvitationManager : Found DNS overwrite for domain.tld to https://localhost:8448
úno 09 16:32:16 hattie mxisd[30713]: .119  INFO [      Thread-18]  i.k.m.invitation.InvitationManager : Posting onBind event to https://localhost:8448/_matrix/federation/v1/3pid/onbind
úno 09 16:32:16 hattie mxisd[30713]: .125  INFO [      Thread-18]  i.k.m.invitation.InvitationManager : Answer code: 500
úno 09 16:32:16 hattie mxisd[30713]: .126  WARN [      Thread-18]  i.k.m.invitation.InvitationManager : Answer body: {"errcode":"M_UNKNOWN","error":"Internal server error"}
2018-02-09 16:32:16,123 - synapse.access.https.8448 - 59 - INFO - POST-1022558- 127.0.0.1 - 8448 - Received request: POST /_matrix/federation/v1/3pid/onbind
2018-02-09 16:32:16,124 - synapse.http.server - 145 - ERROR - POST-1022558- Failed handle request synapse.http.server._async_render on <synapse.federation.transport.server.TransportLayerServer object at 0x7fe2e74bfcd0>: <SynapseRequest at 0x7fe29b8c7758 method=POST uri=/_matrix/federation/v1/3pid/onbind clientproto=HTTP/1.1 site=8448>: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/synapse/federation/transport/server.py", line 192, in new_func
    origin, content, request.args, *args, **kwargs
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1445, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/lib/python2.7/dist-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/synapse/federation/transport/server.py", line 495, in on_POST
    raise last_exception
exceptions.RuntimeError: Failed to send to any server.
2018-02-09 16:32:16,125 - synapse.access.https.8448 - 91 - INFO - POST-1022558- 127.0.0.1 - 8448 - {None} Processed request: 2ms (0ms, 0ms) (0ms/0) 55B 500 "POST /_matrix/federation/v1/3pid/onbind HTTP/1.1" "Apache-HttpClient/4.5.3 (Java/1.8.0_151)"
maxidorius commented 6 years ago

If you already have the client override, you should leave it in and have the federation override as well.

For your error, it seems the invite is for a room where everyone left, so synapse is unable to process it. That's an edge case, and synapse doesn't seem to handle it well. And mxisd doesn't handle it in a smart way either.

I'll improve the error handling and get back with a version that can at least bypass such errors so the other invitations can be processed.

mxisd is lacking a management interface to also cancel such invitations short of deleting the row in the database. Something else that need to be worked on...

TL;DR: seems like an edge case, need to improve handling. Wi get back to you ASAP with another build.

kepi commented 6 years ago

You are right, I totally forgot that I left that room since :) Is there any way how to delete this invite so it is not spamming logs for the ethernity?

Thanks!

maxidorius commented 6 years ago

the short term solution is to delete that row from the database. you want to do so in the invite_3pid table (or similar name) for a room ID starting with !aBcdEfgHijkLmnoPqRsTuv (taken from your logs, you might have changed that value too, so check in the msixd log for it)

maxidorius commented 6 years ago

@kepi Did you manage to fix it in the database directly?

kepi commented 6 years ago

yes, sorry I didn't give you feedback. I had to restart mxisd after deleting row, at least it looked like it is still appearing in logs but after restart no spam in log.

maxidorius commented 6 years ago

ok great, thank you for the feedback. I'll close this issue then.