jijo-paulose / node-xmpp-bosh

Automatically exported from code.google.com/p/node-xmpp-bosh
0 stars 0 forks source link

SRV reply checks #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently, if the DNS server replies badly to a SRV request, this request will 
be handled, which will break the connection (e.g. the previous bug report about 
the issue with jappix.com, which is now fixed).

A good example is that some DNS answer with a CNAME to a SRV request of this 
SRV entry does not exists! NXB does not make any difference between the good 
and the bad.

I think using a regex to validate the answer would be a good idea, there might 
be some examples on the Internet about that.

Original issue reported on code.google.com by vanaryon on 5 Jun 2011 at 7:01

GoogleCodeExporter commented 9 years ago
So, I tried connecting to user1@a.jappix.com but the connection failed (as 
expected). How should I reproduce this? I would like to be able to reproduce 
the "hang" case that you are seeing.

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 7:21

GoogleCodeExporter commented 9 years ago
This is what we have discussed about 2 days ago by XMPP. You can reproduce it 
making a DNS server replying a SRV entry exists, but not using the correct SRV 
syntax (it was a CNAME reply in my case).

So I think validating the SRV reply syntax may fix this issue.

Original comment by vanaryon on 5 Jun 2011 at 9:08

GoogleCodeExporter commented 9 years ago
Do you have a domain on which this behaviour can be reproduced? I am using the 
node.js DNS resolver, so unless I see what is exactly happening, it would be 
hard to fix it. http://nodejs.org/docs/v0.4.7/api/dns.html#dns.resolveSrv

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 9:22

GoogleCodeExporter commented 9 years ago
I think if you try connecting to, let's say, stats.jappix.com using XMPP, it 
will fail. This is not a XMPP domain, but there is the SRV bug because no SRV 
entry is configured for "stats".

So the bug might be the same here ;)

Original comment by vanaryon on 5 Jun 2011 at 9:55

GoogleCodeExporter commented 9 years ago
Just checking with the basic.js test:

$> node basic.js --username="test@stats.jappix.com" --password=xx

This works fine and does not hang. What behaviour do you see when you run it?

This is the dig output I see. What do you see?

$> dig -t SRV _xmpp-client._tcp.stats.jappix.com

; <<>> DiG 9.7.3 <<>> -t SRV _xmpp-client._tcp.stats.jappix.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 1154
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;_xmpp-client._tcp.stats.jappix.com. IN SRV

;; AUTHORITY SECTION:
jappix.com.     1656    IN  SOA a.dns.gandi.net. hostmaster.gandi.net. 1307095670 
10800 3600 604800 10800

;; Query time: 109 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Jun  5 15:29:14 2011
;; MSG SIZE  rcvd: 114

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 10:00

GoogleCodeExporter commented 9 years ago
Mhh that's pretty strange. You are right, using dig now I don't get any bad 
reply on non-SRV domains.

2 days ago, I removed the *.jappix.com record. I think it was the bad SRV reply 
reason.

Anyway, it was replying a "CNAME jappix.com.". But if you tell me the SRV 
resolution is nodejs-dependant, we'd better leave this bug for NXB and report 
it to nodejs, telling them that nodejs SRV module should process a regex check 
of the reply.

Original comment by vanaryon on 5 Jun 2011 at 10:07

GoogleCodeExporter commented 9 years ago
Yep. Makes sense.

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 10:14

GoogleCodeExporter commented 9 years ago
Either ways, it would be nice if you can set up such a bad DNS entry on some 
domain (or subdomain) so that it can be reported properly - showing the exact 
failure case.

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 10:15

GoogleCodeExporter commented 9 years ago
Mhh, it may break the DNS file, because Gandi DNS are a bit strange with that.

I am looking for my command line logs, if I can found anything remaining ;)

Original comment by vanaryon on 5 Jun 2011 at 10:19

GoogleCodeExporter commented 9 years ago
I found our chatlogs, where the reply appears: 
http://codingteam.net/public/muclogs/jappix@conference.codingteam.net/2011-06-02
.html#21:39:22

Original comment by vanaryon on 5 Jun 2011 at 10:25

GoogleCodeExporter commented 9 years ago
Did you have a * record configured at that time? If so, what did it point to? 
(jappix.com)?

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 10:30

GoogleCodeExporter commented 9 years ago
It was a * 86400 IN CNAME jappix.com., and I believe it was the cause of the 
issue (Gandi DNS servers are running Bind9, this bug is very strange for 
Bind9!).

Original comment by vanaryon on 5 Jun 2011 at 10:33

GoogleCodeExporter commented 9 years ago
Which makes me wonder if it's a BIND bug more than a node bug!!

Original comment by dhruvb...@gmail.com on 5 Jun 2011 at 10:52

GoogleCodeExporter commented 9 years ago
I think node & BIND are buddy on that point.
BIND because it returns a bad answer
node because it does not filter the answer and detect it is wrong

Original comment by vanaryon on 5 Jun 2011 at 10:57

GoogleCodeExporter commented 9 years ago
Actually, node.js has a timeout of 3 mins. Since there were no DEBUG logs, it 
wasn't apparent that the DNS SRV record resolution was timing out after 3 mins. 
Added those. This should ensure that the client gets a termination notification 
within 3 mins. (though it seems like a really long time - no idea how to reduce 
it since the node.js DNS module doesn't provide any way of specifying it).

Original comment by dhruvb...@gmail.com on 7 Jun 2011 at 2:27