nervous-systems / fink-nottle

Asynchronous Clojure/Clojurescript client for Amazon's SNS & SQS services
The Unlicense
48 stars 4 forks source link

Occasional batch of 10 'internal-error' failures #12

Closed joelittlejohn closed 8 years ago

joelittlejohn commented 8 years ago

We're using fink-nottle to put onto SQS in batches. Very occasionally (once a week or less) we're seeing a batch of 10 errors from fink-nottle like:

clojure.lang.ExceptionInfo: internal-error
    at clojure.core$ex_info.invoke(core.clj:4593) ~[usermessaging.jar:na]
    at fink_nottle.sqs.channeled$failure__GT_throwable.invoke(channeled.cljc:37) ~[usermessaging.jar:na]
    at clojure.core$map$fn__4553.invoke(core.clj:2624) ~[usermessaging.jar:na]
    at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[usermessaging.jar:na]
    at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[usermessaging.jar:na]
    at clojure.lang.Cons.next(Cons.java:39) ~[usermessaging.jar:na]
    at clojure.lang.RT.next(RT.java:674) ~[usermessaging.jar:na]
    at clojure.core$next__4112.invoke(core.clj:64) ~[usermessaging.jar:na]
    at clojure.core.async$onto_chan$fn__10001$state_machine__9436__auto____10002$fn__10004.invoke(async.clj:595) ~[usermessaging.jar:na]
    at clojure.core.async$onto_chan$fn__10001$state_machine__9436__auto____10002.invoke(async.clj:593) ~[usermessaging.jar:na]
    at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940) ~[usermessaging.jar:na]
    at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944) ~[usermessaging.jar:na]
    at clojure.core.async.impl.ioc_macros$put_BANG_$fn__9458.invoke(ioc_macros.clj:961) ~[usermessaging.jar:na]
    at clojure.core.async.impl.channels.ManyToManyChannel$fn__5875.invoke(channels.clj:218) ~[usermessaging.jar:na]
    at clojure.lang.AFn.run(AFn.java:22) [usermessaging.jar:na]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]

Do you have any idea how we could narrow down this 'internal-error' a bit further? Do you think fink-nottle could be modified in some way to attach more information in this case?

moea commented 8 years ago

There ought to be as much information associated with the ExceptionInfo object as we get back from SQS, which is generating the error. Calling clojure.core/ex-data on the exception instance will return a map having keys :code (:internal-error here), :batch-id, :message & :sender-fault, corresponding in order to the keys listed here (with some obvious, but inconsequential renaming). :message is probably your best bet. The documentation could be improved on this point.

joelittlejohn commented 8 years ago

Great, thanks for the detailed explanation! I understand more about what this error is now (I wasn't sure originally if this was indicative of an SQS error, an error from fink, or an error from another library).

Will log the ex-data as well. It's probably just a transient 500 from SQS, but very useful to know that ex-data is where the key info resides :+1: