isunbejo / exaproxy

Automatically exported from code.google.com/p/exaproxy
Other
0 stars 1 forks source link

unable to go on www.microsoft.com #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. configure exaproxy (default configuration is right)
2. from client go to http://www.microsoft.com
3. page website not found is returned

What is the expected output? What do you see instead?
Page of www.microsoft.com

What version of the product are you using? On what operating system?
debian wheezy python 2.6 latest version from mercurial

Please provide any additional information below.

Original issue reported on code.google.com by mell...@gmail.com on 20 Jun 2012 at 8:02

GoogleCodeExporter commented 9 years ago
Some more info:
problem seems related to "recursive" CNAME pointing
_____ output from dig -t a support.microsoft.com ___________________
support.microsoft.com.  583 IN  CNAME   mso-geo.microsoft.akadns.net.
mso-geo.microsoft.akadns.net. 300 IN    CNAME   support.microsoft.akadns.net.
support.microsoft.akadns.net. 547 IN    A   157.56.132.33

if there is a cname that "point" to a cname exaproxy hang and return website 
not found page

Original comment by mell...@gmail.com on 20 Jun 2012 at 9:02

GoogleCodeExporter commented 9 years ago
it may have been fixed by:
http://code.google.com/p/exaproxy/source/detail?r=8dbf7ef7476a8ae0b9417e811d2056
657df67d4c

Original comment by thomas.mangin on 25 Jun 2012 at 7:23

GoogleCodeExporter commented 9 years ago
Sorry to tell you, but the version i am using have already that patch.
I investigated some more on that problem:
i partially solved the problem changing the following line
-----line 67 in lib/exaproxy/dns/definition.py -------------------------
ok = complete is True and None not in (identifier, queries, responses, 
authorities, additionals)

in 

ok = complete is True and None not in (identifier, queries, responses)

------------------------------------------------------------------------
i made this change because i noticed that sometimes queries and responses are 
populated but authorities and additional not, but with line as :
   self.responses = (responses or []) if ok else []
queries and responses become null.
this way some sites that hanged Es support.microsoft.com now works, and 
sometimes even www.microsoft.com but other times this last hang.
I even noticed that createResponse in lib/exaproxy/dns/codec.py get called for 
every line cname or A returned from the first "query"; but from that first 
query/response we already have all hosts and cnames to go to the final address 
to contact.

In the end i think that there is a problem (maybe a race condition considering 
that not always happen) in the whole dns code, and that problem arise when the 
dns record is populated with "some" A and/or cname.
In case of "plain" dns there is no problem at all, the following site work all 
time:
www.dell.com.       179 IN  CNAME   www1.ins.dell.com.
www1.ins.dell.com.  2   IN  A   143.166.224.244

Hope to be clear, i will investigate some more and let you know.

Bye mello.

Original comment by mell...@gmail.com on 25 Jun 2012 at 9:25

GoogleCodeExporter commented 9 years ago
Thank you very much for your time, there is no need for you into it, we will 
fix the issue later on today or tomorrow. Just as soon as urgent workloads are 
out of the way.

Original comment by thomas.mangin on 25 Jun 2012 at 10:53

GoogleCodeExporter commented 9 years ago
Discovered that with this dns answer:

dig -t A www.microsoft.com @62.101.93.101

; <<>> DiG 9.8.1-P1 <<>> -t A www.microsoft.com @62.101.93.101
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59287
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 10, ADDITIONAL: 9

;; QUESTION SECTION:
;www.microsoft.com.     IN  A

;; ANSWER SECTION:
www.microsoft.com.  1186    IN  CNAME   toggle.www.ms.akadns.net.
toggle.www.ms.akadns.net. 164   IN  CNAME   g.www.ms.akadns.net.
g.www.ms.akadns.net.    201 IN  CNAME   lb1.www.ms.akadns.net.
lb1.www.ms.akadns.net.  246 IN  A   64.4.11.20

;; AUTHORITY SECTION:
akadns.net.     62058   IN  NS  ns6-129.akadns.net.
akadns.net.     62058   IN  NS  ns14-130.akadns.org.
akadns.net.     62058   IN  NS  ns6-131.akadns.org.
akadns.net.     62058   IN  NS  ns8-200.akadns.net.
akadns.net.     62058   IN  NS  ns2-129.akadns.net.
akadns.net.     62058   IN  NS  ns3-129.akadns.net.
akadns.net.     62058   IN  NS  ns20-131.akadns.org.
akadns.net.     62058   IN  NS  ns3-131.akadns.org.
akadns.net.     62058   IN  NS  ns1-129.akadns.net.
akadns.net.     62058   IN  NS  ns2-131.akadns.org.

;; ADDITIONAL SECTION:
ns1-129.akadns.net. 55448   IN  A   193.108.88.129
ns2-129.akadns.net. 55448   IN  A   2.22.230.129
ns2-131.akadns.org. 51280   IN  A   2.22.230.131
ns3-129.akadns.net. 55448   IN  A   23.61.199.129
ns3-131.akadns.org. 51280   IN  A   23.61.199.131
ns6-129.akadns.net. 55448   IN  A   95.100.168.129
ns6-131.akadns.org. 51281   IN  A   95.100.168.131
ns8-200.akadns.net. 55448   IN  A   96.17.144.200
ns14-130.akadns.org.    51280   IN  A   96.7.251.130

;; Query time: 4 msec
;; SERVER: 62.101.93.101#53(62.101.93.101)
;; WHEN: Fri Jun 29 00:10:49 2012
;; MSG SIZE  rcvd: 499

------------------------------------------------
if i print the content of variable decisions after line 144 in file 
lib/exaproxy/reactor/reactor.py

i get this result:
[('1', 'download', '23.61.199.131\x0080\x000\x00GET HTTP://www.microsoft.com 
HTTP/1.1\r\nUser-Agent: curl/7.26.0\r\nHost: www.microsoft.com\r\nAccept: 
*/*\r\n\r\n')]
--------------------------------------

so exaproxy is trying to contact the ip address of one of the additionals 
authority and not the real final ip of www.microsoft.com [in the record A], 
obviously the result is a timeout/hang.

Bye, Mello.

Original comment by mell...@gmail.com on 28 Jun 2012 at 10:23

GoogleCodeExporter commented 9 years ago
Thank you for the extra information.

Original comment by thomas.mangin on 28 Jun 2012 at 11:33

GoogleCodeExporter commented 9 years ago
Hello,

Thank you for taking the time to report this bug and for the further 
information you provided.

The line testing for (identifier, queries, responses, authorities, additionals) 
containing None is in fact correct and is checking that each of these sections 
of the dns response was properly parsed. During normal operation, the variables 
you removed should hold lists that may or may not be empty.

Even if editing it partially solved your problem, the line should still read: 

ok = complete is True and None not in (identifier, queries, responses, 
authorities, additionals)

I've written a couple of test scripts to check the dns parser and, after a bug 
fix to the code we wrote to serialize dns responses, I can confirm that the 
parser yields the expected result every time we run it. You'll see this bug fix 
in the latest commit log but it affected code that's never actually executed by 
the proxy so it's safe to say that the problem lies elsewhere.

Our own dns servers never return responses that are so verbose but it occurs to 
me that the example you provided was most likely long enough to have forced a 
request over TCP. Since our testing won't have hit this path very often, and 
since the response may have come over more than one packet, it's quite possible 
that we have a bug in response reassembly.  I'll investigate this possibility 
tonight and post a fix as soon as I find the problem.

In the meantime, you should find that undoing your modifications to the parser 
will stop the proxy picking an incorrect ip address from the response - we'll 
just signal a dns resolution failure.

regards,
David

Original comment by iwantmyname on 29 Jun 2012 at 7:30

GoogleCodeExporter commented 9 years ago
Thanks, i had just reget the source after seeing your new "fixes"; but now 
every site give me the following error (please don't hate me ;-) ):
Traceback (most recent call last):
  File "/usr/lib/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/opt/exaproxy/lib/exaproxy/util/debug.py", line 71, in <module>
    execfile(sys.argv[0])
  File "/opt/exaproxy/lib/exaproxy/application.py", line 139, in <module>
    Supervisor(configuration).run()
  File "/opt/exaproxy/lib/exaproxy/supervisor.py", line 195, in run
    self.reactor.run()
  File "/opt/exaproxy/lib/exaproxy/reactor/reactor.py", line 141, in run
    response = self.resolver.getResponse(resolver)
  File "/opt/exaproxy/lib/exaproxy/reactor/resolver/manager.py", line 259, in getResponse
    resolved = self.resolveDecision(command, decision, ip)
  File "/opt/exaproxy/lib/exaproxy/reactor/resolver/manager.py", line 137, in resolveDecision
    newdecision = '\0'.join((ip, args))
TypeError: sequence item 0: expected string, tuple found

Original comment by mell...@gmail.com on 29 Jun 2012 at 8:16

GoogleCodeExporter commented 9 years ago
Thanks for the heads up.

I added some 'helpful' code to reduce the number of queries you'll need to 
perform and wasn't very careful.

The bug should now be fixed in the latest version - it looks ok on my test 
machine

Original comment by iwantmyname on 29 Jun 2012 at 9:29

GoogleCodeExporter commented 9 years ago
Hi, i just tested it and seems that now its working fine, i substituted the dns 
of my ISP (that returned too long answers) with the ones from openDNS (normally 
i use them but forgot to change them on the PC on which i was trying exaproxy); 
i will let you know if there will be other problem, for me this issue could be 
considered fixed.
Thanks for the good program and the fast support.

Bye, Mello.

Original comment by mell...@gmail.com on 30 Jun 2012 at 9:40

GoogleCodeExporter commented 9 years ago
Can confirm that with latest source and using DNS from opendns (that don't 
return too long answers) all works without any problems. I connect to exaproxy 
from office to my personal home plug server (via openvpn), used it intensively 
and never had error pages or timeout (and it was really fast too!).
Thanks again.

Bye Mello.

Original comment by mell...@gmail.com on 3 Jul 2012 at 8:40

GoogleCodeExporter commented 9 years ago
I'm glad that the workaround was enough to get that working for you.

We've now performed some further testing with large dns responses and 
discovered that there was a problem with the way we were handling some strings 
when decoding packets.  If you pull the latest source then you should find that 
you can use your ISP's servers with no further problems.

Now that we've identified and fixed the cause of the problem, we can consider 
this issue to be resolved.

Original comment by iwantmyname on 5 Jul 2012 at 3:38

GoogleCodeExporter commented 9 years ago

Original comment by iwantmyname on 5 Jul 2012 at 3:52