Use handler-bind instead of handler-case

orthecreedence / blackbird

Common Lisp promise implementation (successor to cl-async-future)

83 stars 10 forks source link

Use handler-bind instead of handler-case #8

Open orthecreedence opened 9 years ago

orthecreedence commented 9 years ago

doesn't unwind the stack when handling errors

joaotavora commented 9 years ago

Thanks @orthecreedence, it'd be really good as this is the main blocker for me when considering your libraries, which otherwise appear to be excellent.

orthecreedence commented 9 years ago

So I'm trying to wrap my head around handler-bind. It either needs to handle a restart or do a local exit from the handler in order for an error to be "handled."

I am new to this construct and before jumping in and implementing something that will probably end up looking a lot like handler-case, I'm wondering what you see the handler-bind forms looking like. Is your primary goal to run restarts? If so, does it make sense to do something like:

(block 'blackbird-handler
  (handler-bind ((condition (lambda (e) ...your handler/restart code here...))
                 ;; no restarts were run, signal the error
                 (condition (lambda (e) (signal-error promise e) (return-from blackbird-handler)))
    ...do stuff...))

Any input you give would be very useful!

joaotavora commented 9 years ago

@orthecreedence here's what I know in the form of answers to some passages of what you wrote:

or do a local exit from the handler in order for an error to be "handled."

handler-bind can do a local exit (actually isn't it called non-local? irrelevant...) by throwing, for example to a catch tag that is outside the handler-bind. In that case it'll be similar to handler-case. But it can also just do something with the condition and let it go about its business.

Is your primary goal to run restarts? If so, does it make sense to do something like:

My main goal is to be able to run restarts, yes. If the conditions are errors, the default behaviour, after your lisp implementation sees them at the very top level, is to run the debugger stored in *debugger-hook*. So one of the ways to "run restarts" is interactively, by choosing from a list. This is during development, when I call some code that eventually calls code of yours that decides to signal some condition. Assuming that you want that condition to bubble up to the user (we're just discussing the form here) of course if you don't want a condition to ever ever bubble up to the user of your library, do use handler-case.

In my program logic, I might also want to run restarts automatically, by wrapping my own handler-bind forms (where I write them, not pass some lambda down to your library), around code that I suspect or know will call your code. In my handler binds I can choose to cl:find-restart and then cl:invoke-restart non-interactively.

But perhaps the most important thing, and from reading your replies (and your code) you might be missing it, is that if you don't resignal I can do a third thing. I can see the restarts established by restart-case anywhere in the million-levels-deep-stack, including restarts that you established and including restarts established by the libraries whose code you called. And I can see the stack trace, with every frame that was called and, in some lisps, even see local variables and such. Once you handler-case, you "forget" about this, everything is unwound.

So what does this rant mean (assuming it means something)? That while choose to do the javascript or ruby-like thing of letting users of your library pass closures as args named on-error-run-this to your functions, perfectly legal in lisp, there is a better lisp way: if the condition is to be seen by the user, let it bubble up as high as possible, possibly crashing the process into the debugger, which is a wonderful thing.

When is it a bad thing? In production code of course, where do want the error to be demoted to a warning for example. That's why hunchentoot and some other libraries have those *debug-on-errors* variables that probably live in an if inside one of those handler-bind lambdas. If *debug-on-errors* is not t, that lambda makes the server reply to client with a "500 internal server error" condition and then throws to some catch tag outside the handler-bind, the error being effectively "handled". This may be the default (probably a sane choice) but you should give the user the option to disable it. So, if *debug-on-errors* is t, then the web-server will block in the debugger for a million years waiting for the developer to examine the stack and look for the root of the problem in the stack.

This is all to say that no, I don't think it makes sense to do something as you did there. I think you have to redesign your API's slightly to say instead of:

"FROB:FROBNICATE might frankinshize, if you must handle it 
specially, pass it a lambda in the IF-FRANKINSHIZUMS-DO 
parameter if you want your code to run when that happens"

say

"FROB:FROBNICATE might frankinshize by signalling 
FROB:FRANKINSHIZUM. If you must handle it specially, 
use handler-bind or handler-case anywhere."

Maybe you can do something that makes the two cases more or less compatible.

orthecreedence commented 9 years ago

I think I am starting to understand. So cl-async will still take :event-cb arguments, but instead of handler-caseing all errors and passing them to event-cb, it will only pass cl-async related conditions to event-cb and all other errors will bubble up. This behavior can be changed by :catch-app-errors, which if t will basically make everything production-safe (via handler-case) but if nil will allow all conditions/errors to bubble up to the debugger (or any wrapping handler-bind). Is this all correct?

As far as blackbird goes, any handler-bind around with-promise/catcher/etc will let errors bubble to the top if blackbird:*debug-on-error* is t, and otherwise will do a non-local exit (default) and catch the error in the form?

Thank you for bearing with me!

joaotavora commented 9 years ago

@orthecreedence, yes I think you're on the right track. I planned on expanding my previous post with some examples, but xmas got in the way :-D

The :catch-app-errors is a good idea. And the following sentences soud right:

This behavior can be changed by :catch-app-errors, which if t will basically make everything production-safe (via handler-case) but if nil will allow all conditions/errors to bubble up to the debugger (or any wrapping handler-bind).

As far as blackbird goes, any handler-bind around with-promise/catcher/etc will let errors bubble to the top if blackbird:debug-on-error is t, and otherwise will do a non-local exit (default) and catch the error in the form?

I still wanted to understand exactly why :event-cb is needed as an argument. If it's used as a handler of "exceptional" conditions then it should also be let out of the library code as a condition with the full trace, just like any other error. If, on the other hand, it's an intrinsic part of your design for an event-based control flow, then I guess it makes sense to have client code pass a value to a :event-cb kwarg into your library.

But even then there should be zero resignalling, so that the handler runs as deep in the stack as the condition originated. And if you implement :event-cb with handler-bind behind the scenes, you don't even need to bother passing them along as an argument in any internal function, do you understand? (this is one of the advantages I wanted to demonstrate with some examples)

But all this talk is worthless without some actual code. Tell me how and where you planning to make the first changes, then I can have a look. In the meantime I will also have a look at the cl-async and blackbird code...

joaotavora commented 9 years ago

@orthecreedence, actually maybe scratch that plan. I gotta try cl-async again, then get to the part that was irking me and try to resolve it. If I remember correctly it was a couple of things in wookie like:

My code did something stupid and the debugger did show up, but all I could see in the debugger's backtrace is when you had caught it and the description of the error, which was useful, but it wouldn't tell me exactly where in my code it had popped up. In hunchentoot and other code I can see this and quickly jump to the source.
There was some kind of connection error in a part of your lib, but I could not easily tell where it had originated.

orthecreedence commented 9 years ago

i am starting to see your point with the event-cb stuff. and i also understand christmas timing, i barely get a few minutes here and there to reply =).

i think one case where traditional error handling fails is the fact the stack is unwound after each cl-async action is triggered. so within the event loop i may open a socket and want to be able to see events on that particular socket and act on them in the context of that code itself. you won't be able to do this without having your handler-bind code outside of (wrapping) the event loop because the stack unwinds and removes your handlers otherwise. this is why i favor being able to do some callback-based "event" handling, as opposed to having everything handled in a set of handlers outside the event loop. although you're correct that non-cl-async conditions should not be getting caught and passed to any event-cb.

also, once again, i may be missing something obvious here so feel free to correct me.

orthecreedence commented 9 years ago

I believe this is fixed in c4ea241dfc02fd27d1d419bc86c5c0f2e187ddd9. There's now an exported symbol, *debug-on-error*, which tells all handler-bind forms (previously handler-case) to trap errors and do non-local exit if nil (the default) but if t will let errors bubble up to the debugger without interference. This has been implemented in create-promise, catcher, and promisify (the three main areas of error handling).

I am very open to any changes if you feel anything was implemented incorrectly here. Also, I'm pushing out some changes to cl-async soon that include the changes we discussed above (this may be a day or two later as I review things, I'm a bit more cautious with the codebase because more people rely on it).

orthecreedence commented 9 years ago

Ok, please see https://github.com/orthecreedence/cl-async/issues/108. I decided to keep event-cb because it is used to catch events from libuv on a per-connection basis, but other than the conditions created by libuv itself, all conditions are now allowed to bubble up without being touched or passed into event-cb (and enter the debugger) if :catch-app-errors nil is passed (which is the default). If :catch-app-errors t is given, ALL errors are caught and passed to the function :caught-errors (given on startup).

There are more details in the cl-async issue. Please give it a shot, and let me know if you need changes!

joaotavora commented 9 years ago

I tried loading the very bleeding edge of cl-async and its dependencies and after some effort, I managed to do it.

I was trying to run this very basic example

(with-event-loop ()
      (tcp-connect "www.google.com" 80
                   (lambda (socket data)
                     (when t
                       (format t "Done processing!~%")
                       (close-socket socket)))  ; close the socket if done processing
                   (lambda (error) (format t "Ooops an error!"))
                   :data (format nil "GET /~c~c" #\return #\newline)))

and make variations on it so I could check if error handling is now cool and if I can make my handler-binds around this and establish restarts near the "done processing" and all that.

But I crashed into this, and didn't debug any further, sorry. Perhaps you can help me?

The alien function "uv_loop_size" is undefined.
   [Condition of type SB-KERNEL::UNDEFINED-ALIEN-FUNCTION-ERROR]

Restarts:
 0: [RETRY] Retry SLY mREPL evaluation request.
 1: [*ABORT] Return to SLY's top level.
 2: [ABORT] Abort thread (#<THREAD "sly-channel-1-mrepl-remote-1" RUNNING {1003483CA3}>)

Backtrace:
  0: ("undefined function")
  1: (LIBUV:UV-LOOP-SIZE)
  2: (SB-INT:SIMPLE-EVAL-IN-LEXENV (WITH-EVENT-LOOP NIL (TCP-CONNECT "www.google.com" 80 (LAMBDA # #) (LAMBDA NIL #) :DATA ...)) #<NULL-LEXENV>)
  3: (EVAL (WITH-EVENT-LOOP NIL (TCP-CONNECT "www.google.com" 80 (LAMBDA # #) (LAMBDA NIL #) :DATA ...)))
  4: ((LAMBDA NIL :IN SLYNK-MREPL::MREPL-EVAL-1))

orthecreedence commented 9 years ago

Seems like an outdated version of libuv. Can you confirm you have libuv >= 1.0.0?

joaotavora commented 9 years ago

No I can't, because I don't :-). Homebrew on Mac OSX gives me libuv 0.10.21. Building from head right now, will be back with some more info soon. Shouldn't cffi and such warn early when compiling bindings?

joaotavora commented 9 years ago

After much effort (had to trash quicklisp for some reason), my little example now yields another error. Using a pristine quicklisp, sbcl 1.2.2, master cl-async, cl-libuv and fast-io

#<DNS-ERROR -1: the addrinfo->ai_addr object was null (stinks of a memory alignment issue) {10060C64A3}>
   [Condition of type DNS-ERROR]

But we already have an illustration of the problems that I think still exist with cl-async. I've noticed already that :catch-app-errors set to nil doesn't catch this error. Is this a non-app error in your view? But why is it being caught? I think this is the fundamental question, btw. So that the loop can go on and process other events? We can do this behaviour in production (with a default handler that just logs the error to some file). In development let it block the thread and all other pending events, so that the memory alignment smells can be analysed from their origin. As it stands I had to resignal it:

(with-event-loop (:catch-app-errors nil)
      (tcp-connect "www.google.com" 80
                   (lambda (socket data)
                     (format t "Done processing!~%")
                     (close-socket socket))
                   (lambda (error) (error error))
                   :data (format nil "GET /~c~c" #\return #\newline)))

BIG UPDATE. I didn't read the doc or you didn't write it yet, but I've noticed that :caught-errors does what I wish and apparently the error originates from the c callback dns-cb. Much better. I still wonder about some other parts, but so far so god.

orthecreedence commented 9 years ago

You're right, that memory alignment error should not be getting passed to the event-cb. That is a special case I forgot about when doing the conversion. I'll review each of the subsystems' C callbacks today and make sure that non-libuv conditions get thrown instead of passed to event-cb.

orthecreedence commented 9 years ago

Oh also, I'll spin up my mac (or bsd, similar) VM today and try to track down that actual DNS error. I might just cave in the next few days and rebuild cl-libuv via cffi-groveler so this type of junk doesn't happen anymore.

orthecreedence commented 9 years ago

I fixed that DNS throwing issue (the error is now thrown) and checked all the other systems to make sure conditions are being treated properly. Here's the breakdown: any libuv status that comes through a libuv callback that is not a "success" will have an event condition created for it and be passed to the attached event-cb. Any libuv status that comes back from calling a libuv function directly (such as writing to a socket) that is not a "success" will have an error thrown for it. If an error is not thrown, then it's a bug in cl-async. Any errors that occur in non-cl-async code (but were spawned via cl-async) are not touched by cl-async at all, unless :catch-app-errors t is passed.

In other words, the only thing event-cb is used for now is things like "socket timed out" or "dns not found" etc. Operational errors ("writing to socket returned an error" et al) throw errors. All thrown errors are untouched.

However, I still have yet to track down the root cause of the DNS error you're seeing. I tried dns-lookup on sbcl on a BSD machine and it just hangs, no errors and no output. So that's broken there for sure. Didn't have time to keep digging. I'll try spinning up my OSx VM and give it a shot (if it works, the VM is really finnicky and I don't have an extra mac laying around so wish me luck).

joaotavora commented 9 years ago

Here's the breakdown: any libuv status that comes through a libuv callback that is not a "success" will have an event condition created for it and be passed to the attached event-cb.

This part I almost understand, untill the part where you talk about "passing it to the attached event-cb". Why not let it bubble up to top level? In other words, why not make it behave like the next category of unexpected behaviour, which I'll comment on shortly?

Any libuv status that comes back from calling a libuv function directly (such as writing to a socket) that is not a "success" will have an error thrown for it.

This is quite OK. But is the caught by some handler-case or handler-bind in some conditions? In a production scenario I do want these errors to be handled. Should my client code do that? Via a argument somewhere like:

(in-package :capitaomorte)

(cl-async:some-good-function #'process-success
   :error-handling #'log-error))

, or by wrapping my own handler-case around some cl-async::fiasco condition class?

(in-package :capitaomorte)

(handler-case 
   (cl-async:some-good-function #'process-success)
  (cl-async:fiasco (f) (log-error f))

Or even better, if cl-async provides some cool restarts that I can use, I'll do

(in-package :capitaomorte)

(handler-bind
    ((cl-async:fiasco (lambda (f)
                        (when (find-restart 'cl-async:log-it-and-continue)
                          (invoke-restart 'cl-async:log-it-and-continue))
                        ;; must be some more serious fiasco
                        ;; that doesn't provide this restart so
                        ;; let it bubble up to someone that knows how
                        ;; to handle it
                        ))
     ;; In this scenario, I imagine that this one may be thrown 
     ;; by PROCESS-SUCCESS. In the others I would need 
     ;; similar
     (capitaomorte:usual-f*ckup
       (lambda (f)
         (if (capitaomorte:production-mode-p)
             (invoke-restart 'capitaomorte:muffle-it)))))
  (cl-async:some-good-function #'process-success))

I'll try spinning up my OSx VM and give it a shot (if it works, the VM is really finnicky and I don't have an extra mac laying around so wish me luck).

I can help track this down too and give my socket-fu a polish.

orthecreedence commented 9 years ago

This part I almost understand, untill the part where you talk about "passing it to the attached event-cb". Why not let it bubble up to top level?

My reasoning for this is to be able to handle events differently on a per-object basis. For instance

(as:with-event-loop ()
  (as:tcp-connect
    "www.google.com" 80
    (lambda (sock data)
      (my-app:do-stuff data))
    (lambda (ev)
      (when (typep ev 'as:socket-eof)
        (my-app:we-got-booted (as:tcp-socket ev))))))

I like this because a socket hanging up isn't worth exiting the event loop, IMO. In order to achieve the same thing with as:socket-eof being thrown, we'd have to do this:

(handler-bind
    ((as:socket-eof
       (lambda (ev)
         (my-app:we-got-booted (as:tcp-socket ev))
         (let ((restart (find-restart 'cl-async:continue-event-loop ev)))
           (when restart (invoke-restart restart))))))
  (as:with-event-loop ()
    (as:tcp-connect
      "www.google.com" 80
      (lambda (sock data)
        (my-app:do-stuff data)))))

This seems like a lot of boilerplate and mental overhead to handle something as simple as a socket EOFing, which is fairly commonplace. And consider this scenario:

(as:with-event-loop ()
  ;; analytics are nice but not critical
  (as:tcp-connect
    "my-analytics.com" 80
    (lambda (sock data) ...)
    (lambda (e)
      (when (typep e 'as:socket-refused)
        (format t "call to analytics server failed, no big deal ~a~%" e))))
  ;; we need this
  (as:tcp-connect
    "launch-the-missle.net" 80
    (lambda (sock data) ...)
    (lambda (e)
      (when (typep e 'as:socket-refused)
        (error "Nuke launch failed!")))))

Would this even have a direct translation in a handler-bind form? We can't wrap the individual forms with handler-bind because the callbacks are invoked under a different stack than the calling function, so by the time the error is thrown the stack has completely unwound and our error handlers are gone. So we'd have to wrap the outside of the with-event-loop with a big handler-bind form and somehow distinguish between our analytics socket and our nuke socket if an error occurs.

On top of this, from my understanding (helped quite a bit by talking to you) one of the main benefits for letting errors bubble up is to a) get backtraces and b) drop into the debugger. A backtrace here is useless because it's only a few levels deep under start-event-loop, and I'm not convinced debugging is the right option either because it seems like the only two viable options here are to either exit the event loop (which happens automatically if that socket was the only event left to process) or to pass the condition to event-cb and keep processing the loop (which is how things currently work).

In other words, I'm just not sure it would be useful to completely abolish event-cb in cases where libuv is actually handing us events having to do with the corresponding objects we're operating on. And in doing so, it may create a lot more overhead by forcing all error handling to the outside of the event loop (we can't use handler-case or handler-bind around async operations so we need to put them outside the event loop) as well as making it a challenge to identify which errors are occuring on which objects.

This is quite OK. But is the caught by some handler-case or handler-bind in some conditions? In a production scenario I do want these errors to be handled. Should my client code do that?

Yes, this would be the calling code's responsibility. All errors thrown should have the base cl-async:event-error so you could do something like:

(handler-case
  (as:with-event-loop ()
    (my-app:do-things-that-might-throw-errors ...))
  (as:event-error (e)
    (my-app:log-error e)))

The one exception I know of is the DNS error you found, which is actually more of a panic than anything else, and is thrown with the type cl:error, although I am open to changing this to another error type. This is thrown after a DNS request comes back (async) though, so it would have to be caught outside the event loop.

As mentioned over in https://github.com/orthecreedence/cl-async/issues/108, :catch-app-errors t will catch all errors that happen in any code running under cl-async and pass them to :caught-errors with the exception of errors that happen when calling a libuv function directly, in which case cl-async doesn't catch the error at all. This can be something like writing to a socket that's closed, opening a server on a port that's already bound, etc. These errors are thrown and are not handled by cl-async at all, even with :catch-app-errors t.

In fact, I'm starting to wonder if :catch-app-errors t is even useful, because it can now be completely mimicked by doing:

(handler-bind
    ((error (lambda (c)
              (let ((res (find-restart 'cl-async-util:continue-event-loop c)))
                (when res (invoke-restart res))))))
  (as:with-event-loop (:catch-app-errors nil
                       :caught-errors
                         (lambda (e)
                           (format t "- caught: ~a~%" e)))
    (as:with-delay ()
      (format t "x is ~a~%" (+ 4 'five)))))

continue-event-loop passes the caught error to :caught-errors and aborts the current callback but keeps the loop running. Thus, you can continue the loop on a per-condition basis. this is all undocumented so far, and will probably remain that way until things solidify over the next few weeks.

joaotavora commented 9 years ago

This seems like a lot of boilerplate and mental overhead to handle something as simple as a socket EOFing, which is fairly commonplace.

The "boilerplate" argument doesn't hold much for me in common lisp, because macros. cl-async could well provide a restarting-on-hangup macro that does that, or a slightly more generally named macro with more flexibility and some default behaviour.

Also note that I'm not saying that that parameter should be extinct, only optional.

Would this even have a direct translation in a handler-bind form? We can't wrap the individual forms with handler-bind because the callbacks are invoked under a different stack than the calling function

I see your point, indeed. The :event-cb is useful there indeed, and now I understand the intrinsic need for "attaching" handlers. But why not let it have handler-bind semantics? I.e, if it doesn't invoke a restart or throw to some some exported cl:event-handling-done tag, then the error does bubble up. You capture the lambda that the user passes into :event-cb and simply funcall it from a handler-bind inside cl-async, wrapped inside a cl:event-handling-done tag. THis is unfortunately backward incompatible to existing code, so some solution would have to be found for that.

Tangentially, I personally also hate the fact that I have to use typep. I'd rather use CLOS for that, and declaratively define

(defmethod my-app:handle-event ((event as:socket-refused))
   (if (string= (url-of event)) "analytics") ... ...))

And then pass #'my-app:handle-event as the error handler. CLOS should do the right thing.

Actually cl-async could provide a basic cl-async:handle-error generic function with some preloaded methods, and export it. The export means client code can add methods to it. I'll explain more below.

main benefits for letting errors bubble up is to a) get backtraces and b) drop into the debugger

There is also the benefit that any restarts either cl-async's or mine established along the way can be called from any handler-bind

In other words, I'm just not sure it would be useful to completely abolish event-cb in cases where libuv is actually handing us events having to do with the corresponding objects we're operating on. And in doing so, it may create a lot more overhead by forcing all error handling to the outside of the event loop (we can't use handler-case or handler-bind around async operations so we need to put them outside the event loop) as well as making it a challenge to identify which errors are occuring on which objects.

Yes, this argument is made. Don't abolish it. Optionally give it handler-bind semantics. To ensure backward compatibility, make handler-case semantics the default.

| Yes, this would be the calling code's responsibility. All errors thrown should have the base cl-async:event-error so you could do something like:

I assume you meant "all errors thrown by cl-async". In that case, it is more important that they all eventually provide a cl-async-util:continue-event-loop restart than their exact type, cause that's really what we're looking for here.

The one exception I know of is the DNS error you found, which is actually more of a panic than anything else, and is thrown with the type cl:error, although I am open to changing this to another error type.

I guess. But if someday you feel that cl-async should delegate to some other library that uses some other error class, then don't resort to catching, changing the error class and resignalling, use restarts instead.

In fact, I'm starting to wonder if :catch-app-errors t is even useful, because it can now be completely mimicked by doing:

Yes exactly. But I don't find it horrible, keep it. It's nice to pass it the value returned by some function (my-app:production-mode-p).

Or, I'd make it a generalized boolean and get rid of :caught-errors entirely (or rather deprecate it). So if one passes nil to :catch-app-errors, everything bubbles up. If one passes t to it everything is handled through the as:handle-error generic function, which has some nice built-in methods that catch only cl-async errors and don't mess with user-specific errors at all. If one passes it a lambda, call that lambda with the error class. That function is resposible for handling all the errors. The user has the choice to add methods to as:handle-error or call it from his/her lambda.

This very same behaviour would go for the fourth argument of functions such as as:tcp-connect and overrides the behaviour for :catch-app-errors.

So to summarize, here's what I'd like to see:

make the fourth argument of functions such as as:tcp-connect optional. nil means everything bubbles up. t means use as:handle-error, a new generic and a lambda is a lambda as usual, but called with handler-bind semantics, i.e. if it doesn't throw or invoke restarts the error continues its journey.
add an exported catch tag of cl:event-handling-done alongside the cl-async-util:continue-event-loop that you apparently already provide. Export both.
Add a as:handle-error generic function and define some methods for it. Export that.
make the :catch-app-errors argument the same semantics as the previous item. it's only called when a specific request doesn't have an error handler attached, which is the point you made earlier.
error-catcher is deprecated.

How backward incompatible is all this? The new handler-bind semantics certainly break some existing code. The optionality and generalized booleaness of :catch-app-errors and the fourth arg to as:tcp-connect are less of a problem, but would still need adjustments.

Sorry for the long-windedness and What do you think?

orthecreedence commented 9 years ago

Sorry in advance for the great wall of text.

The "boilerplate" argument doesn't hold much for me in common lisp

True =].

But why not let it have handler-bind semantics? I.e, if it doesn't invoke a restart or throw to some some exported cl:event-handling-done tag, then the error does bubble up.

What I'm saying is that any event condition that is even passed to event-cb originates from a cl-async callback. What this means is that any handler-bind that catches it must be outside the event loop.

Keep in mind, that as of the last few commits, cl-async does not catch errors inside of user-supplied callbacks by default, they are allowed to bubble up. event-cb is used to notify the user only in cases where libuv gives any status other than success on a specific operation. For instance:

(as:with-event-loop ()
  (as:with-delay ()
    (+ 4 'five)))

This triggers an error that passes all the way through cl-async to the top level/debugger, with full stack/restarts. In previous versions, the above error would be caught and rethrown inside of cl-async, erasing any stack or restarts.

That said, here's my understanding of what you want:

;; cl-async's event-handling code
(catch 'event-handling-done
  (handler-bind
      ((error (lambda (e) (funcall event-cb e))))
    (error (make-instance 'tcp-eof :socket the-socket))))

;; event-cb
(lambda (e)
  (when (some-logic-here)
    (throw 'event-handling-done)))

Here's how it currently works (and the event-cb that would essentially do the same thing)

;; cl-async's event-handling code
(funcall event-cb (make-instance 'tcp-eof :socket the-socket))

;; event-cb
(lambda (e)
  (unless (some-logic-here)
    (error e)))

I'm really not opposed to implementing the event handling from the first example and by default throwing to event-handling-done once event-cb returns unless cl-async:*throw-event-conditions* (or something like it) is t. This would preserve backwards compatibility with the option of calling error on event conditions. But I do wonder if this is very useful because the error handling has to be done outside the event loop anyway, and the (error some-event) calls are going to share almost the exact same stack (originating in the cl-async error handling code).

There is also the benefit that any restarts either cl-async's or mine established along the way can be called from any handler-bind

True, but calling (error e) from an event-cb would have the same effect here as throwing the event inside cl-async. I hope the examples above illustrate what I'm talking about.

The one exception I know of is the DNS error you found [...] I am open to changing this to another error type.

don't resort to catching, changing the error class and resignalling, use restarts instead.

What I meant was I'd be inclined to change the actual (error "DNS error!") call to something like (error (make-instance 'event-error :msg "DNS error!")).

I hope that my days of catching and rethrowing errors are over =].

In fact, I'm starting to wonder if :catch-app-errors t is even useful, because it can now be completely mimicked by doing

But I don't find it horrible, keep it.

Will do!

Or, I'd make it a generalized boolean and get rid of :caught-errors entirely (or rather deprecate it). So if one passes nil to :catch-app-errors, everything bubbles up. If one passes t to it everything is handled through the as:handle-error generic function, which has some nice built-in methods that catch only cl-async errors and don't mess with user-specific errors at all. If one passes it a lambda, call that lambda with the error class. That function is resposible for handling all the errors. The user has the choice to add methods to as:handle-error or call it from his/her lambda.

I really like the idea of being able to call :catch-app-errors with a lambda and getting rid of :caught-errors.

everything is handled through the as:handle-error generic function, which has some nice built-in methods that catch only cl-async errors and don't mess with user-specific errors at all

cl-async handles all these conditions internally. In other words, when it calls event-cb with a socket-eof condition, it already handles the closing of the socket and any needed cleanup at that point. I don't think there's a use case where having a generic event handler makes sense inside of cl-async, however that wouldn't stop the user from providing their own method for handling any errors that come through.

Actually though, now that I think about it, providing a generic function that gets called when :catch-app-errors t is passed would be kind of nice, and by default maybe it could just log any errors that came through.

I do like that idea.

So, for your final suggestions:

make the fourth argument of functions such as as:tcp-connect optional. nil means everything bubbles up. t means use as:handle-error, a new generic and a lambda is a lambda as usual, but called with handler-bind semantics, i.e. if it doesn't throw or invoke restarts the error continues its journey.

As much as it pains me to change the tcp-connect API in such a large way, I think this makes the most sense. I think for now I will rename the event-cb arg in tcp-connect to event-cb-dep and add an event-cb keyword arg, and throw a deprecation warning if people pass a value to event-cb-dep. I'll keep this around for a few months until everyone has had enough time to update.

add an exported catch tag of cl:event-handling-done alongside the cl-async-util:continue-event-loop that you apparently already provide. Export both.

I still think this may need a bit more discussion. If the event handling code examples I posted above are indeed what you envision, then I'm fine with implementing this, but am still curious as to what use it would be.

Add a as:handle-error generic function and define some methods for it. Export that.

I like it.

make the :catch-app-errors argument the same semantics as the previous item. it's only called when a specific request doesn't have an error handler attached, which is the point you made earlier.

Actually my point (and I hope I made this clearer in the code examples above) was that event-cb doesn't catch user-generated errors at all anymore, it now only receives cl-async event conditions. One thing I've been kicking around is a 'send-to-eventcb restart that would allow the user to send a caught error to the event-cb instead of what will be the function in :catch-app-errors. This preserves backward compatibility with the error handling of older cl-async versions. Perhaps this could even be a *send-errors-to-eventcb* variable to allow really easily switching to legacy error handling.

error-catcher is deprecated.

Do you mean :caught-errors? If so, then yes it will have a very short life =].

joaotavora commented 9 years ago

I will read the rest of you reply later, but right now, I just wanted to clarify this:

But why not let it have handler-bind semantics? I.e, if it doesn't invoke a restart or throw to some some exported cl:event-handling-done tag, then the error does bubble up.

Keep in mind, that as of the last few commits, cl-async does not catch errors

I just wanted to clarify that I meant the error handlers supplied by the user should have "handler-bind semantics". I.e. when you detect that a user passed one, just funcall it inside your handler-bind handler that lets everything through. Is that what is done. I'll post an example later.

orthecreedence commented 9 years ago

I think i get it:

;; current
(tcp-connect "somewhere.com" 80 'my-read-cb :event-cb 'do-something-with-error)

;; what I understand you are saying
(handler-bind
    ((error 'do-something-with-error))
  (tcp-connect "somewhere.com" 80 'my-read-cb))  ;; no :event-cb, handler is in handler-bind

Is that what you mean?

joaotavora commented 9 years ago

@orthecreedence, no I don't think so. I'm saying something much simpler I think. I'm saying keep supporting event-cb args, but give them handler-bind semantics. This of course can mean a million of things and that's the source of the confusion I mean this.

Without handler-bind semantics, you probably now have.

;; error bubbles up
(tcp-connect "nowhere.com" 80 'my-read-cb)

;; Error does not bubble up
(tcp-connect "nowhere.com" 80 'my-read-cb :event-cb #'(lambda (e) (format t "oh noes~%")))

With handler-bind semantics, user's :event-cb has to explicitly find a point where control can be restarted, otherwise the error is untouched.

;; error bubbles up
(tcp-connect "nowhere.com" 80 'my-read-cb)

;; error bubbles up
(tcp-connect "nowhere.com" 80 'my-read-cb :event-cb #'(lambda () (format t "oh noes~%")))

;; error does not bubble up
(tcp-connect "nowhere.com" 80 'my-read-cb 
  :event-cb #'(lambda (e) (format t "oh noes~%") (invoke-restart cl-async:well-known-restart))

;; error does not bubble up
(tcp-connect "nowhere.com" 80 'my-read-cb 
  :event-cb #'(lambda (e) (format t "oh noes~%") (throw 'cl-async:well-known-tag nil))

;; error does not bubble up
(tcp-connect "nowhere.com" 80 'my-read-cb 
  :event-cb #'(lambda (e) (format t "oh noes~%") (return-from 'cl-async:well-known-block nil))

joaotavora commented 9 years ago

But your previous example is also correct. I also mean that.

orthecreedence commented 9 years ago

Here's a quick example (using the latest version of cl-async):

(handler-bind
    ((error
       (lambda (e)
         (let ((res (find-restart 'cl-async-util:continue-event-loop e)))
           (when res (invoke-restart res))))))
  (as:with-event-loop (:catch-app-errors nil
                       :send-errors-to-eventcb t)
    (as:delay
      (lambda () (format t "result: ~a~" (+ 4 'five)))
      :event-cb
        (lambda (e) (format t "got err: ~a~%" e)))))

I think what you're saying is you want something like this:

(as:with-event-loop (:catch-app-errors nil
                     :send-errors-to-eventcb: t)
  (as:delay
    (lambda () (format t "result: ~a~%" (+ 4 'five)))
    :event-cb
      (lambda (e)
        (format t "got err: ~a~%" e)
        (throw 'cl-async-util:event-handling-done))))

The current version lets the error bubble to outside the loop, where the continue-event-loop restart sends it to the :event-cb. You're saying you want the error to be sent to the :event-cb on its way up with the option of doing a non-local exit to keep the error from bubbling?

I'm fairly certain this can be done in a backwards compatible manner with (yet another) switch somewhere (such as :handle-error-in-eventcb or something like it) which if nil, would just throw to the event-handling-done tag automatically after running the event-cb, otherwise if t would require that the event-cb throw to the tag if it wants to stop the error from bubbling. I guess the only challenge would be making sure events (tcp-eof etc) are handled the same way (although it might be possible to just wrap them in the same error handling code as everything else and call (error ...) on them).