kamax-matrix / mxisd

Federated Matrix Identity Server
GNU Affero General Public License v3.0
223 stars 115 forks source link

SRV record resolution seems to be failing despite container dns being functional #74

Closed lazypower closed 6 years ago

lazypower commented 6 years ago

MXISD seems to be doing the right things during the registration dance however it does hit a segment where its attempting to resolve the identity service of the matrix.org hosted sydent servers.

2018-04-13 04:50:35.375  INFO [nio-8090-exec-6]   i.k.mxisd.session.SessionMananger : Creating remote 3PID session for io.kamax.matrix.ThreePid@4a5b1906 with local session [1523594994866] to {}
2018-04-13 04:50:35.375  INFO [nio-8090-exec-6]   i.k.mxisd.session.SessionMananger : Remote 3PID is allowed by policy
2018-04-13 04:50:35.384  INFO [nio-8090-exec-6]    i.k.m.matrix.IdentityServerUtils : Discovery Identity Server for matrix.org
2018-04-13 04:50:35.384  INFO [nio-8090-exec-6]    i.k.m.matrix.IdentityServerUtils : Performing SRV lookup
2018-04-13 04:50:35.384  INFO [nio-8090-exec-6]    i.k.m.matrix.IdentityServerUtils : Lookup name: _matrix-identity._tcp.matrix.org
2018-04-13 04:50:35.619  INFO [nio-8090-exec-7]  i.k.m.c.i.v1.SessionRestController : Requested: http://chat.linuxlab.sh/_matrix/identity/api/v1/3pid/getValidated3pid
2018-04-13 04:50:36.236  INFO [nio-8090-exec-6]    i.k.m.matrix.IdentityServerUtils : No SRV record for _matrix-identity._tcp.matrix.org
2018-04-13 04:50:36.241 ERROR [nio-8090-exec-6]     i.k.m.c.DefaultExceptionHandler : Reference #1523595036239 - https://matrix.org could not be resolved to an Identity server
2018-04-13 04:50:36.242  INFO [nio-8090-exec-6]     i.k.m.c.DefaultExceptionHandler : Request GET http://chat.linuxlab.sh/_matrix/identity/remote/api/v1/validate/requestToken - Error M_UNKNOWN: An internal server error occurred. If this error persists, please contact support with reference #1523595036239

I've been unable to track down why this is the case. mxisd is running in kubernetes via the official container. DNS resolution appears to be working as best I can tell - I've remote into the container and installed drill via apk and gave a quick dns resolution request and it yields only the SOA record...

bash-4.3# drill _matrix-identity._tcp.matrix.org
;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 60285
;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 
;; QUESTION SECTION:
;; _matrix-identity._tcp.matrix.org.    IN  A

;; ANSWER SECTION:

;; AUTHORITY SECTION:
matrix.org. 3298    IN  SOA derek.ns.cloudflare.com. dns.cloudflare.com. 2027510442 10000 2400 604800 3600

;; ADDITIONAL SECTION:

;; Query time: 3 msec
;; SERVER: 10.96.0.10
;; WHEN: Fri Apr 13 04:55:38 2018
;; MSG SIZE  rcvd: 113

So I'm unsure where the disconnect is coming from. Has the address for this changed and I've got a stale configuration lingering somewhere? Is there a routine i can set that will instead just store the local address so my users don't see the error page while I investigate a proper fix for the SRV resolution?

I also see through manytools that the SRV record indicated does not actually resolve in any of the major DNS resolvers. https://manytools.org/network/query-dns-records-online/

selection_012

maxidorius commented 6 years ago

You've done nothing wrong, this is a regression due to a change a little while ago to make federation opt-in, with side effect of not properly dealing with matrix.org anymore.

I've pushed a potential fix with image kamax/mxisd:1.0.0-1-g78a25c2 Could you give it a try and let me know if it fixes the issue for you?

lazypower commented 6 years ago

I'll take a look directly after work this evening. Thank you for the fast response time on this

lazypower commented 6 years ago

Great news! that had the fix in it. It tested cleanly and I received the hash validation leg email from the matrix prime servers.

maxidorius commented 6 years ago

Awesome, thank you for taking the time to test! The fix will make it into the next maintenance release.