taoensso / timbre

Pure Clojure/Script logging library
https://www.taoensso.com/timbre
Eclipse Public License 1.0
1.44k stars 171 forks source link

IllegalAccessException when logging internal JDK exception (SunCertPathBuilderException) via `log/error` #365

Closed jumarko closed 1 year ago

jumarko commented 1 year ago

One of our users reported an error caused by IllegalAccessException thrown from within log/error. It looks like the code is trying to do some introspection on the exception object but it doesn't have access to do that.

(try
      (http/post url request)
      (catch Exception e
        ;; TODO: if `e` is `SunCertPathBuilderException` then this log statement can actually fail!
        (log/error e "cannot post: " request)
        ))
...
 jdk.internal.reflect.Reflection.newIllegalAccessException          Reflection.java:  392
java.lang.IllegalAccessException: class clojure.core$bean$fn__7278$fn__7279 
cannot access class sun.security.provider.certpath.SunCertPathBuilderException (in module java.base)
 because module java.base does not export sun.security.provider.certpath to unnamed module @358c99f5
ptaoussanis commented 1 year ago

Hi Juraj, Timbre doesn't do anything too unusual with exceptions.

By default, it'll use io.aviso/pretty for exception printing. If you want to rule that out as a possible problem, you could override the default error fn (see the :output-fn info in the config docs).

Can the standard toString be called on this exception? ex-cause? Can it be printed?

Otherwise I'd suggest trying to understand exactly what that error implies. I'm not familiar with SunCertPathBuilderException off-hand, but would try either Googling to see if you can find a possible cause - or try to create a minimal reproducible example.

Hope that's of some help! Cheers

jumarko commented 1 year ago

Thanks for the suggestion. It seems it's indeed aviso that's causing this. Here's more of the stacktrace for a bit of context:

                                            taoensso.timbre/stacktrace              timbre.cljc:  944
                                     io.aviso.exception/format-exception            exception.clj:  641
                                     io.aviso.exception/format-exception            exception.clj:  643
                                      io.aviso.exception/write-exception            exception.clj:  636
                                    io.aviso.exception/analyze-exception            exception.clj:  453
                                     io.aviso.exception/expand-exception            exception.clj:  418
                                           io.aviso.exception/match-keys            exception.clj:  101
                                                     clojure.core/reduce                 core.clj: 6886
                                             clojure.core.protocols/fn/G            protocols.clj:   13
                                               clojure.core.protocols/fn            protocols.clj:   75
                                       clojure.core.protocols/seq-reduce            protocols.clj:   24
                                                        clojure.core/seq                 core.clj:  139
                                                                     ...                               
                                             clojure.core/bean/thisfn/fn           core_proxy.clj:  427
                                                     clojure.core/bean/v           core_proxy.clj:  419
                                                 clojure.core/bean/fn/fn           core_proxy.clj:  413

Ultimately, it's clojure.core/bean that fails but I'm not sure why aviso ends up calling it (transitively).

I'm pretty sure you can do standard things, like calling .getMessage method on the exception object, but it's the reflective access that's causing problems, because that is prohibited in newer JDKs. For instance, you can do this just fine:

        (log/error "cannot post: " (.getMessage e))
jumarko commented 1 year ago

The reproducer is quite simple. Try to call a website with bad SSL certificate and use sufficiently new java version. For me, it works properly until JDK 15 and it starts failing with JDK 16:

$ env local   16.0.1

$ clj
Clojure 1.11.1
user=> (require '[taoensso.timbre :as log])
(try
  (slurp "https://untrusted-root.badssl.com/")
  (catch Exception e
    (log/error e "ssl error")))

nil
user=> Execution error (IllegalAccessException) at jdk.internal.reflect.Reflection/newIllegalAccessException (Reflection.java:385).
class clojure.core$bean$fn__7278$fn__7279 cannot access class sun.security.validator.ValidatorException (in module java.base) because module java.base does not export sun.security.validator to unnamed module @76e041cf
jumarko commented 1 year ago

@ptaoussanis just got another customer report log with the same problem. Is there a chance that timbre could handle these cases automatically? It's a shame that this is masking the real problem and we are not able to see any exception details because of that.

ptaoussanis commented 1 year ago

@jumarko Hi Juraj, did you try my earlier suggestions to either debug possible underlying causes of SunCertPathBuilderException or to override the default error fn?

Your options seem to be:

  1. Try eliminate the issue that's generating these exceptions.
  2. Try identify why clojure.core/bean has issues with these exceptions. Might be something that can be fixed upstream.
  3. Avoid the use of clojure.core/bean by substituting a different error fn.

Beyond these, I'm not sure what else to recommend.

Is there a chance that timbre could handle these cases automatically?

Could you please be a bit more specific? What do you have in mind for Timbre to do differently here?

jumarko commented 1 year ago

@ptaoussanis thanks for the super-quick answer. I didn't try to override the default error fn. Debugging a particular cases is rather easy but inconvenient and we need a generic solution.

I also think that all the users would benefit from having a reliable way of printing stacktraces - it could be another common Exception class that's encapsulated inside and suddenly you get an unexpected error. I think this will become more of a problem after everybody moves to JDK 17+ and it's very hard to spot it yourself (you just expect that (log/error e ...) works and don't think about internal exception classes not available for reflection.

jumarko commented 1 year ago

Just to make sure it's clear - standard thing like (.printStacktrace e) works well. Also clojure.tools.logging can log this without problems.

See https://github.com/jumarko/clojure-experiments/blob/develop/src/clojure_experiments/logging.clj#L158-L175

ptaoussanis commented 1 year ago

thanks for the super-quick answer.

No problem 👍

Debugging a particular cases is rather easy but inconvenient and we need a generic solution.

A generic solution to what, exactly? Until the cause behind this particular clojure.core/bean issue is identified, we don't actually know what's happening, how wide spread it might be, or what solutions or workarounds might exist.

Is there a bug in clojure.core/bean? How widely applicable is it? Is there a bug in io.aviso/pretty? How widely applicable is it? Is this behaviour perhaps an intended result of some security/permissions/config issue with newer JVMs?

I haven't investigated, so can't speculate - this would need to be debugged properly.

In my first message I suggested Googling the SunCertPathBuilderException error to see what comes up. You haven't indicated whether you've done any digging on this?

If for some reason debugging isn't a possibility for you right now, a 1-line config change [1] should be able to reliably work around the issue. As you mentioned, something like (.printStacktrace e) works fine - so why not use that as your error function in the meantime until someone else is able to debug?

Edit to add example of config change: [1] (timbre/merge-config! {:output-opts {:error-fn (fn my-error-fn [data] (.printStackTrace (:?err data)))}})

jumarko commented 1 year ago

Thaks again, Peter, for the elaborate response. I really appreciate your time.

What I was after is this:

Again, this can be trivially reproduced:

(try
  (slurp "https://untrusted-root.badssl.com/")
  (catch Exception e
    (log/error e "ssl error")))

Of course it's up to you to decide where this is something that will be beneficial for all timbre users and weight the tradeoffs. In the meantime, we are going to modify the global config as you suggest.

ptaoussanis commented 1 year ago

Thaks again, Peter, for the elaborate response. I really appreciate your time.

You're very welcome Juraj, and likewise - I appreciate your thoughts on this, and assistance gathering info 👍

Thanks also for the clear summary, that's helpful. I'm in agreement with your first 3 points, but have a reservation about the 4th:

I suggest, that this can be a relatively common (and you don't know it until you face it pretty hard) problem that is broader then this particular instance and is definitely not specific to our environment.

It's of course possible that this could be a common problem, but I'm reluctant for us to assert that it is common/broad until we've done more to establish an underlying cause. At least from our exchanges on this issue so far (but please correct me if I'm wrong!), it looks like there's still low-hanging fruit available from your end re: diagnosing the underlying cause?

This may seem needlessly fussy/stubborn, but I'd like to characterise it as intentional stubbornness ^^ This kind of process structure can really help open-source maintainers out.

Please keep in mind that maintainers get a lot of questions and reports. So inevitably, maintainers need to try and prioritise/triage where to focus effort. If I'm working on issue 1, it means that I'm not working on issue 2 (or some other new feature/library/etc.).

Personally, my hope is to contribute the largest possible net benefit to folks - which means needing to mindfully prioritise focus. Some of the factors that I personally consider include:

a. Does the issue potentially cause a significant cost (impact)? b. How widespread/likely will the above cost be? (One indicator being frequency of reports). c. Has the issue been confirmed? (Or am I uniquely well-equipped to try and confirm it?). d. Has the cause been identified? (Or am I uniquely well-equipped to try and identify it?). e. Can the issue be easily and reliably worked around? f. Are the reporters out of ideas/avenues that they may be able to pursue effectively on their own?

Does that objective + heuristic make sense / seem ~reasonable?

On this particular issue: since there's so far been only 1 report and since a reliable work-around seems easy - my current plan is to look at this closer when I'm next doing batched work on Timbre (no particular ETA, but probably several months away since I just cut a new release in December).

Things that would help bump the priority of this issue relative to others:

  1. An increase in the asserted issue impact
  2. A decrease in the cost of addressing the issue

1 and 2 are both sensitive to the underlying cause, so shining light on that is my best guess for how someone might be able to assist.

Otherwise, will definitely keep this open and take a deeper look myself when I'm next on Timbre.

Does that seem reasonable?

jumarko commented 1 year ago

Does that seem reasonable?

It sounds perfectly reasonable.

Thanks a lot for the long answer and for all your contributions to the Clojure community!

ptaoussanis commented 1 year ago

You're very welcome, sorry for the trouble you've run into with this! And please do let me know if for some reason the workaround isn't sufficient in the meantime.

Cheers :-)

ptaoussanis commented 1 year ago

Will be addressed in a new Timbre release that I'll be pushing shortly, marking as done.