Closed GoogleCodeExporter closed 9 years ago
Some more info:
problem seems related to "recursive" CNAME pointing
_____ output from dig -t a support.microsoft.com ___________________
support.microsoft.com. 583 IN CNAME mso-geo.microsoft.akadns.net.
mso-geo.microsoft.akadns.net. 300 IN CNAME support.microsoft.akadns.net.
support.microsoft.akadns.net. 547 IN A 157.56.132.33
if there is a cname that "point" to a cname exaproxy hang and return website
not found page
Original comment by mell...@gmail.com
on 20 Jun 2012 at 9:02
it may have been fixed by:
http://code.google.com/p/exaproxy/source/detail?r=8dbf7ef7476a8ae0b9417e811d2056
657df67d4c
Original comment by thomas.mangin
on 25 Jun 2012 at 7:23
Sorry to tell you, but the version i am using have already that patch.
I investigated some more on that problem:
i partially solved the problem changing the following line
-----line 67 in lib/exaproxy/dns/definition.py -------------------------
ok = complete is True and None not in (identifier, queries, responses,
authorities, additionals)
in
ok = complete is True and None not in (identifier, queries, responses)
------------------------------------------------------------------------
i made this change because i noticed that sometimes queries and responses are
populated but authorities and additional not, but with line as :
self.responses = (responses or []) if ok else []
queries and responses become null.
this way some sites that hanged Es support.microsoft.com now works, and
sometimes even www.microsoft.com but other times this last hang.
I even noticed that createResponse in lib/exaproxy/dns/codec.py get called for
every line cname or A returned from the first "query"; but from that first
query/response we already have all hosts and cnames to go to the final address
to contact.
In the end i think that there is a problem (maybe a race condition considering
that not always happen) in the whole dns code, and that problem arise when the
dns record is populated with "some" A and/or cname.
In case of "plain" dns there is no problem at all, the following site work all
time:
www.dell.com. 179 IN CNAME www1.ins.dell.com.
www1.ins.dell.com. 2 IN A 143.166.224.244
Hope to be clear, i will investigate some more and let you know.
Bye mello.
Original comment by mell...@gmail.com
on 25 Jun 2012 at 9:25
Thank you very much for your time, there is no need for you into it, we will
fix the issue later on today or tomorrow. Just as soon as urgent workloads are
out of the way.
Original comment by thomas.mangin
on 25 Jun 2012 at 10:53
Discovered that with this dns answer:
dig -t A www.microsoft.com @62.101.93.101
; <<>> DiG 9.8.1-P1 <<>> -t A www.microsoft.com @62.101.93.101
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59287
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 10, ADDITIONAL: 9
;; QUESTION SECTION:
;www.microsoft.com. IN A
;; ANSWER SECTION:
www.microsoft.com. 1186 IN CNAME toggle.www.ms.akadns.net.
toggle.www.ms.akadns.net. 164 IN CNAME g.www.ms.akadns.net.
g.www.ms.akadns.net. 201 IN CNAME lb1.www.ms.akadns.net.
lb1.www.ms.akadns.net. 246 IN A 64.4.11.20
;; AUTHORITY SECTION:
akadns.net. 62058 IN NS ns6-129.akadns.net.
akadns.net. 62058 IN NS ns14-130.akadns.org.
akadns.net. 62058 IN NS ns6-131.akadns.org.
akadns.net. 62058 IN NS ns8-200.akadns.net.
akadns.net. 62058 IN NS ns2-129.akadns.net.
akadns.net. 62058 IN NS ns3-129.akadns.net.
akadns.net. 62058 IN NS ns20-131.akadns.org.
akadns.net. 62058 IN NS ns3-131.akadns.org.
akadns.net. 62058 IN NS ns1-129.akadns.net.
akadns.net. 62058 IN NS ns2-131.akadns.org.
;; ADDITIONAL SECTION:
ns1-129.akadns.net. 55448 IN A 193.108.88.129
ns2-129.akadns.net. 55448 IN A 2.22.230.129
ns2-131.akadns.org. 51280 IN A 2.22.230.131
ns3-129.akadns.net. 55448 IN A 23.61.199.129
ns3-131.akadns.org. 51280 IN A 23.61.199.131
ns6-129.akadns.net. 55448 IN A 95.100.168.129
ns6-131.akadns.org. 51281 IN A 95.100.168.131
ns8-200.akadns.net. 55448 IN A 96.17.144.200
ns14-130.akadns.org. 51280 IN A 96.7.251.130
;; Query time: 4 msec
;; SERVER: 62.101.93.101#53(62.101.93.101)
;; WHEN: Fri Jun 29 00:10:49 2012
;; MSG SIZE rcvd: 499
------------------------------------------------
if i print the content of variable decisions after line 144 in file
lib/exaproxy/reactor/reactor.py
i get this result:
[('1', 'download', '23.61.199.131\x0080\x000\x00GET HTTP://www.microsoft.com
HTTP/1.1\r\nUser-Agent: curl/7.26.0\r\nHost: www.microsoft.com\r\nAccept:
*/*\r\n\r\n')]
--------------------------------------
so exaproxy is trying to contact the ip address of one of the additionals
authority and not the real final ip of www.microsoft.com [in the record A],
obviously the result is a timeout/hang.
Bye, Mello.
Original comment by mell...@gmail.com
on 28 Jun 2012 at 10:23
Thank you for the extra information.
Original comment by thomas.mangin
on 28 Jun 2012 at 11:33
Hello,
Thank you for taking the time to report this bug and for the further
information you provided.
The line testing for (identifier, queries, responses, authorities, additionals)
containing None is in fact correct and is checking that each of these sections
of the dns response was properly parsed. During normal operation, the variables
you removed should hold lists that may or may not be empty.
Even if editing it partially solved your problem, the line should still read:
ok = complete is True and None not in (identifier, queries, responses,
authorities, additionals)
I've written a couple of test scripts to check the dns parser and, after a bug
fix to the code we wrote to serialize dns responses, I can confirm that the
parser yields the expected result every time we run it. You'll see this bug fix
in the latest commit log but it affected code that's never actually executed by
the proxy so it's safe to say that the problem lies elsewhere.
Our own dns servers never return responses that are so verbose but it occurs to
me that the example you provided was most likely long enough to have forced a
request over TCP. Since our testing won't have hit this path very often, and
since the response may have come over more than one packet, it's quite possible
that we have a bug in response reassembly. I'll investigate this possibility
tonight and post a fix as soon as I find the problem.
In the meantime, you should find that undoing your modifications to the parser
will stop the proxy picking an incorrect ip address from the response - we'll
just signal a dns resolution failure.
regards,
David
Original comment by iwantmyname
on 29 Jun 2012 at 7:30
Thanks, i had just reget the source after seeing your new "fixes"; but now
every site give me the following error (please don't hate me ;-) ):
Traceback (most recent call last):
File "/usr/lib/python2.6/runpy.py", line 122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
exec code in run_globals
File "/opt/exaproxy/lib/exaproxy/util/debug.py", line 71, in <module>
execfile(sys.argv[0])
File "/opt/exaproxy/lib/exaproxy/application.py", line 139, in <module>
Supervisor(configuration).run()
File "/opt/exaproxy/lib/exaproxy/supervisor.py", line 195, in run
self.reactor.run()
File "/opt/exaproxy/lib/exaproxy/reactor/reactor.py", line 141, in run
response = self.resolver.getResponse(resolver)
File "/opt/exaproxy/lib/exaproxy/reactor/resolver/manager.py", line 259, in getResponse
resolved = self.resolveDecision(command, decision, ip)
File "/opt/exaproxy/lib/exaproxy/reactor/resolver/manager.py", line 137, in resolveDecision
newdecision = '\0'.join((ip, args))
TypeError: sequence item 0: expected string, tuple found
Original comment by mell...@gmail.com
on 29 Jun 2012 at 8:16
Thanks for the heads up.
I added some 'helpful' code to reduce the number of queries you'll need to
perform and wasn't very careful.
The bug should now be fixed in the latest version - it looks ok on my test
machine
Original comment by iwantmyname
on 29 Jun 2012 at 9:29
Hi, i just tested it and seems that now its working fine, i substituted the dns
of my ISP (that returned too long answers) with the ones from openDNS (normally
i use them but forgot to change them on the PC on which i was trying exaproxy);
i will let you know if there will be other problem, for me this issue could be
considered fixed.
Thanks for the good program and the fast support.
Bye, Mello.
Original comment by mell...@gmail.com
on 30 Jun 2012 at 9:40
Can confirm that with latest source and using DNS from opendns (that don't
return too long answers) all works without any problems. I connect to exaproxy
from office to my personal home plug server (via openvpn), used it intensively
and never had error pages or timeout (and it was really fast too!).
Thanks again.
Bye Mello.
Original comment by mell...@gmail.com
on 3 Jul 2012 at 8:40
I'm glad that the workaround was enough to get that working for you.
We've now performed some further testing with large dns responses and
discovered that there was a problem with the way we were handling some strings
when decoding packets. If you pull the latest source then you should find that
you can use your ISP's servers with no further problems.
Now that we've identified and fixed the cause of the problem, we can consider
this issue to be resolved.
Original comment by iwantmyname
on 5 Jul 2012 at 3:38
Original comment by iwantmyname
on 5 Jul 2012 at 3:52
Original issue reported on code.google.com by
mell...@gmail.com
on 20 Jun 2012 at 8:02