orthecreedence / drakma-async

An asynchronous version of drakma that runs off of cl-async
38 stars 5 forks source link

DNS error: -2, nodename nor servname provided, or not known #15

Closed nightshade427 closed 10 years ago

nightshade427 commented 11 years ago

I'm getting this error "DNS error: -2, nodename nor servname provided, or not known" when using drakma-async to send roughly 200 http request in same event loop.

What's strange is given the exact same inputs it sometimes finishes perfectly. Sometime 2 fail with above message. Sometimes 20 fail. All requests also go to same domain and url basically with slightly different query string params.

Any ideas?

orthecreedence commented 11 years ago

Strange indeed. My first guess is it's your router since that's usually the place to do lookups on an internal network. I've seen cases where a router would either delay or fail DNS requests just because it felt like it. One thing to try: take the IP of the host you're hitting and add an entry for it on your platform's hosts file. If the problem persists, it's most likely a libevent or cl-async bug. If it goes away, it may still be a libevent bug, but much more likely a DNS server issue.

Also, what platform are you on? I'm able to test on windows, linux, BSD (so mac, kind of).

OH also please be sure to grab the latest stable libevent as well (2.0.21 if you don't have it already)...I did have reports recently where updating to that version fixed some seemingly random DNS failures.

nightshade427 commented 11 years ago

That's a great idea. I will put the entry in my host file and test that out and report back.

I'm running in aws ec2 so I would hope it wouldn't be router issue, but you never know.

I'm also running latest libevent 2.0.21.

Thanks for the help. Will report back shortly.

nightshade427 commented 11 years ago

Wow, that totally fix it. Thank you so much. Not only does it work consistently it is much faster as well.

orthecreedence commented 11 years ago

Glad that did it, although I'm still wondering what the problem was. If it's on EC2 I wouldn't suspect DNS failure, but maybe confirm that the node is using the correct DNS servers for it's network (/etc/resolv.conf). Not sure how to confirm this since I'm not a big EC2 user. I'm guessing they create the file for you automatically with the correct servers in it.

Another thing to try is pinging the server you're hitting via the ping command a bunch of times (after taking the /etc/hosts entry out). Does it fail at all? Also try opening a few hundred nc commands:

for i in {0..200} ; do
    echo "Sending req $i"
    sh -c 'echo -ne "GET / HTTP/1.1\nHost: thehost.net\n\n" | nc thehost.net 80 >> output.txt 2>>error.txt &'
done

(also making sure to remove the hosts entry you made before running). I'm interested to see if any of these fail.

nightshade427 commented 11 years ago

Sounds like a plan. I will run those test tomorrow and report back (4am here right now). Thanks again for the quick help and guidance.

orthecreedence commented 11 years ago

One more note! If in the future you find yourself in the same situation, you may want to cache the DNS result in memory by making the request once via cl-async's DNS operations and saving it instead of sending it out every time. Even if you're running a DNS server on the actual machine doing the requests itself, you can still shave some time off:

(defparameter *cached-ip* nil)
(as:dns-lookup "www.google.com"
  (lambda (host family)
    (setf *cached-ip* host)
    ;; now run your app, using *cached-ip* as the address instead of the hostname
    (start-making-requests))
  (lambda (err) (format t "DNS err: ~a~%" err)))
nightshade427 commented 11 years ago

Excellent idea. I'll do that now instead of the host file entry and see if it works as well. I like that approach better.

nightshade427 commented 11 years ago

Any idea why this would happen? I'm trying to use the approach you recommended above for doing dns lookup once.

url-two looses its local let binding part way through? url-one also losses its local let binding at same place as url-two.

(with-event-loop (:catch-app-errors t)
  (dns-lookup *host*
              (lambda (host family)
                (declare (ignore family))
                (let ((*url-one* (format nil "http://~a~a" host (uri-path *uri-one*)))
                      (*url-two* (format nil "http://~a~a" host (uri-path *uri-two*))))
                  ;; *url-two* is the new binding
                  (format t "~&let url-two: ~a" *url-two*)
                  (iter (for object in objects)
                        (for i from 0)
                        (alet* ((j i)
                                ;; *url-two* is the new binding still
                                (nil (format t "~&loop-url-two: ~a" *url-two*))
                                ;; *url-one* is the new binding still
                                (nil (format t "~&loop-url-one: ~a" *url-one*))
                                ;; even inside "call-function-that-uses-url-one" 
                                ;; *url-two* binding still holds
                                ;; the method doesn't modify *url-two* in any way.
                                ;; even inside "call-function-that-uses-url-one" 
                                ;; *url-one* binding still holds
                                ;; the method doesn't modify *url-one* in any way.
                                (data-one (call-function-that-uses-url-one))
                                ;; Here is where *url-two* looses its new binding 
                                ;; from above?
                                ;; It is now back to the global one for some reason.
                                (format t "~&loop2-url-two: ~a" *url-two*))
                                ;; Here is where *url-one* looses its new binding 
                                ;; from above?
                                ;; It is now back to the global one for some reason.
                                (format t "~&loop2-url-one: ~a" *url-one*))
                                ;; now this call uses the global *url-two* instead of 
                                ;; new one?
                                (data-two (call-function-that-uses-url-two)))
                          (do-more-stuff-here)))))
              nil))
nightshade427 commented 11 years ago

I did this test and it turned out all good even looped 1000 times, strange. But, using drakma async to do the same 1000 request fails with -2 dns error when using hostname in url and not having /etc/hosts set.

for i in {0..200} ; do
    echo "Sending req $i"
    sh -c 'echo -ne "GET / HTTP/1.1\nHost: thehost.net\n\n" | nc thehost.net 80 >> output.txt 2>>error.txt &'
done
orthecreedence commented 11 years ago

Ok, in your example, what's happening is I'm guessing you defined *url-one* and *url-two* via defparameter or defvar. That declares is as a dynamic variable, not lexical. So when you create the binding with let, you bind the new values and they bind fine, but the first async function you run returns from the alet*, and the loop continues, running each async function without waiting for it to complete, so that once all the actions are queued, the stack unwinds out of the original let. Since the variables are dynamic, not lexical, the binding reverts back to the global one because you're no longer in the dynamic context of the let. If you did:

;; create new lexical bindings here, will exist as long as the lexical form exists
(let ((url-one (format nil "http://~a~a" host (uri-path *uri-one*)))
      ((url-two (format nil "http://~a~a" host (uri-path *uri-two*))))
  (iter ...))

I believe that would fix the problem.

As far as the DNS lookups are conerned, every part of my being is telling me it's most likely a bug in the libevent DNS implementation. Reading various postings around the web on the subject corroborate this. I think the fix would be to a) cache the lookups (like you're doing above) or b) detect a lookup failure and try again. In your case "a" works better because you're only requesting against one host (however, be sure to set the "Host" header to the hostname when making requests via drakma with a raw IP).

Long-term, I'd like to possibly replace libevent with libuv. There's an open issue for this: https://github.com/orthecreedence/cl-async/issues/32.

nightshade427 commented 11 years ago

I fixed it using the following. Hope this is a decent way to do it :)

(defun do-stuff-using-async ()
  (let* ((host (ahost *host*))
         (*url-one* (format nil "http://~a~a" host (uri-path *uri-one*)))
         (*url-two* (format nil "http://~a~a" host (uri-path *uri-two*)))
         (stuff (amap 'list #'do-stuff-async-with-uri-one list-of-data-to-iterate))
         (more-stuff (amap 'list #'do-more-stuff-async-with-uri-two stuff)))
     (do-stuff-with-results stuff more-stuff)))

;; additions to async-future
(in-package #:cl-async-future)

(defun ahost (hostname)
  (let ((result))
   (as:with-event-loop (:catch-app-errors t)
     (as:dns-lookup hostname
                    (lambda (host family)
                      (declare (ignore family))
                      (setf result host))
                    nil))
   result))

(defun amap (type fn &rest sequences)
  (let* ((combined (apply #'map 'list #'list sequences))
         (result (make-array (length combined))))
    (as:with-event-loop (:catch-app-errors t)
      (loop for args in combined
         for i from 0
         do (alet* ((j i)
                    (item (apply fn args)))
              (setf (aref result j) item))))
    (coerce result type)))

(export (find-symbol "AMAP"))
(export (find-symbol "AHOST”))
orthecreedence commented 11 years ago

Yeah, that should work fine!

nightshade427 commented 11 years ago

Great, thanks for all the help and guidance. This works perfectly now. Very fast!!

nightshade427 commented 11 years ago

Thanks for the awesome lisp system!!

orthecreedence commented 11 years ago

Glad it's working for you! Always feel free to let me know if you run into issues.

nightshade427 commented 10 years ago

Now that I have my proxy working well I'm geting these errors again as we go through test proxy out. The Proxy takes a url coming in, grabs the http content with drakma async (this can be any url and site, so I cant use ip, or host file like above), we then take that content and process it if we need to. The problem is when lots of requests come in I get this error (lots of drakma async requests). If I slow down the requests it works fine. It doesnt take much load at all to cause the error, 1-2 rps. Any ideas?

|DNS error: -2, nodename nor servname provided, or not known
[Condition of type CL-ASYNC:DNS-ERROR]
orthecreedence commented 10 years ago

After reading back over the previous posts for this issue, I'm going to have to classify this as a libevent issue. I think the internal DNS implementation is broken.

For now, try something like this:

(ql:quickload '(:cl-async :cl-async-future :puri :drakma-async))

(defpackage :dns-raw
  (:use :cl :cl-async-future))
(in-package :dns-raw)

(defun dns-future (host)
  "Do a future-enabled DNS lookup."
  (let ((future (make-future)))
    (as:dns-lookup host
                   (lambda (ip fam)
                     (declare (ignore fam))
                     (finish future ip))
                   (lambda (ev)
                     (unless (and (typep ev 'as:event-info)
                                  (not (typep ev 'as:event-error))))
                       (signal-error future ev)))
    future))

(as:with-event-loop (:catch-app-errors t)
  (future-handler-case
    (alet* ((scrape-url "http://www.google.com/")
            ;; parse the url, extract the host
            (uri (puri:parse-uri scrape-url))
            (host (puri:uri-host uri))
            ;; do a manual DNS lookup. this *may* work better than libevent's
            ;; internal DNS because cl-async manages the state itself
            (ip (dns-future host))
            ;; set the IP back into the URI object and render it to a string
            (uri-str (with-output-to-string (s)
                       (setf (puri:uri-host uri) ip)
                       (puri:render-uri uri s)))
            ;; make the drakma request, and be sure to set the "Host" header
            ;; so the remote server knows what server you're trying to hit
            (res (drakma-async:http-request uri-str :additional-headers `(("Host" . ,host)))))
      (format t "res: ~a~%" res))
    (t (e) (format t "error: ~a~%" e))))

Note that this is a bit of a shot in the dark, but I'm hoping that by resolving the DNS manually outside of libevent's internal socket connector (and using its separate DNS resolving facilities, which cl-async wraps) you will get better results.


I'm going to put some serious though the next few weeks into wrapping libuv. This is a bigger change, though. A lot of the cl-async internals are hardcoded to use libevent's data structures and C calls. I'd like to make the backends switchable so you could do something like

(push :cl-async-libevent *features*)
(ql:quickload :cl-async)
;; or
(push :cl-async-libuv *features*)
(ql:quickload :cl-async)

Problem is, libevent/libuv calls aren't 1:1 and I'd have to rewrite a lot of code pertaining to SSL. I have to say that in the end, the cl-async base code would be heavily, heavily simplified and a ton of complexity would be moved to the backends that are being targeted (SSL, switching out structure types based on platform, platform-specific calling issues, etc), which is more "correct" anyway.

Like I said though, this is lots of work and my time is really cramped these days.

Please let me know if the above code (or something like it) works better for you.

nightshade427 commented 10 years ago

I rebooted and issue is gone for now. Not sure if it's something that has to build like unreleased stuff or something. I'll try above if it surfaces again. Thanks again as always for wonderful help.

orthecreedence commented 10 years ago

Do yourself a favor and do some load testing on your scraper. It could be a cumulative effect (after 10K requests DNS starts getting flaky) or rate-based (at 5 req/s DNS becomes flaky). If you really pound it, like 100-1000 req/s (on servers that won't mind you doing this, of course) it may expose the problem to a point where you are at least aware of the limitations.

Also, you're on Ubuntu right? Are you running anything like dnsmasq or any sort of local DNS server?

nightshade427 commented 10 years ago

Yeah I'm gonna load test Friday ;)

If any issues come up I'll try the above and use cl-async for dns and see if it helps.

Yeah, running on vanilla ubuntu 13.10 server.