killbill / killbill-plugin-framework-ruby

Framework to write Kill Bill plugins in Ruby
http://killbill.io
8 stars 11 forks source link

logging throwables with JRuby stack-traces does not work #25

Open kares opened 9 years ago

kares commented 9 years ago

related to "occasional" traces seen as :

java.lang.IllegalArgumentException: name
at sun.misc.URLClassPath$Loader.getResource(URLClassPath.java:529)
at sun.misc.URLClassPath.getResource(URLClassPath.java:199)
at java.net.URLClassLoader$1.run(URLClassLoader.java:358)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:1225)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1688)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1569)
at org.apache.felix.framework.BundleWiringImpl.doImplicitBootDelegation(BundleWiringImpl.java:1763)
at org.apache.felix.framework.BundleWiringImpl.searchDynamicImports(BundleWiringImpl.java:1698)
at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1534)
at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:77)
at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1993)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at ch.qos.logback.classic.spi.PackagingDataCalculator.loadClass(PackagingDataCalculator.java:207)
at ch.qos.logback.classic.spi.PackagingDataCalculator.bestEffortLoadClass(PackagingDataCalculator.java:232)
at ch.qos.logback.classic.spi.PackagingDataCalculator.computeBySTEP(PackagingDataCalculator.java:138)
at ch.qos.logback.classic.spi.PackagingDataCalculator.populateUncommonFrames(PackagingDataCalculator.java:113)
at ch.qos.logback.classic.spi.PackagingDataCalculator.populateFrames(PackagingDataCalculator.java:105)
at ch.qos.logback.classic.spi.PackagingDataCalculator.calculate(PackagingDataCalculator.java:57)
at ch.qos.logback.classic.spi.ThrowableProxy.calculatePackagingData(ThrowableProxy.java:147)
at ch.qos.logback.classic.spi.LoggingEvent.<init>(LoggingEvent.java:124)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:440)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:396)
at ch.qos.logback.classic.Logger.warn(Logger.java:713) 

some full traces at https://gist.github.com/anonymous/aea6b5a96e05bd2531dc

this seems to be one of the factors for unpredictable behaviour as well as likely a major source of "unreliability" for doing any performance optimizations that involve integrating with JRuby ...

whenever a Throwable is processed by logback it will try to analyse classes involved in stack-trace elements thus trying to load each class based on it's name, whenever a JRuby internal JIT-ed class name is there it will lead to an exception such as above - under logback effectively loosing the original exception.

on logback this seems to end up in a considerable slow-down (that only gets triggered after JRuby warms-up) https://gist.github.com/kares/e914ce2afae287a6059d#file-logback_with_yjp_agent-out

similar behaviour happens with log4j2 as it tries to analyze the stack-trace elements: https://gist.github.com/kares/e914ce2afae287a6059d#file-log4j2-log

WiP but in case you have any feedback please let me know as at this point my paranoia is still growing :)

pierre commented 9 years ago

One thought on this: could we use conversion words like %ex and/or %rEx to work-around the issue (see http://logback.qos.ch/manual/layouts.html)?

kares commented 9 years ago

might work (depends on how it routes the throwable formatting code ... assuming it does not do any of the %xEx class resolving) - mostly tested the whole thing with log4j2 as I got quite paranoid that both of the logging frameworks have this at their core and do this implicitly ...

p.s. with e.printStackTrace at the suspicious place run out of disk space in 4h - log was above 8GB

pierre commented 9 years ago

might work (depends on how it routes the throwable formatting code ... assuming it does not do any of the %xEx class resolving)

Do you have an easy way to reproduce it? I'd like to give it a shot.

mostly tested the whole thing with log4j2 as I got quite paranoid that both of the logging frameworks have this at their core and do this implicitly ...

While switching logging frameworks is easy for a new project, it's a bit more difficult in practice because of production setups (special logback.xml to deal with monitoring, hiding credit card numbers, etc.). That's why I'd like to find and document a workaround for at least logback.

p.s. with e.printStackTrace at the suspicious place run out of disk space in 4h - log was above 8GB

So, it seems that it's really a JVM issue and as long as the logging framework doesn't unwind the stack, we should be good? Maybe we can configure it to not try to print the whole thing (e.g. %ex{100} for logback), if the exception is thrown from org.jruby.internal.runtime.methods.JittedMethod and/or simply org.killbill.billing.osgi.bundles.jruby.JRubyPlugin.

pierre commented 9 years ago

I've managed to take a look at the exception handling by running the Litle plugin and trying to add a credit card which isn't valid in Litle (4111111111111111).

Here is the default stacktrace I'm getting in the add payment method call:

2015-03-06 12:10:34,507 [Thread-11] WARN  o.k.b.k.0.1.4.SNAPSHOT - [litle-plugin] Unsuccessful add_payment_method: Authorization amount has already been depleted
2015-03-06 12:10:34,705 [Thread-11] WARN  o.k.b.k.0.1.4.SNAPSHOT - [litle-plugin] Failure in add_payment_method: Authorization amount has already been depleted
/var/tmp/bundles/plugins/ruby/killbill-litle/1.10.0/gems/gems/killbill-3.2.2/lib/killbill/helpers/active_merchant/payment_plugin.rb:176:in `add_payment_method'
/var/tmp/bundles/plugins/ruby/killbill-litle/1.10.0/gems/gems/killbill-litle-1.10.0/lib/litle/api.rb:90:in `add_payment_method'
/var/tmp/bundles/plugins/ruby/killbill-litle/1.10.0/gems/gems/killbill-3.2.2/lib/killbill/gen/plugin-api/payment_plugin_api.rb:442:in `addPaymentMethod'
Killbill$$Plugin$$Api$$PaymentPluginApi_1170753834.gen:13:in `addPaymentMethod'
2015-03-06 12:10:34,740 [Plugin-th-120] WARN  o.k.b.p.core.PaymentMethodProcessor - Error adding payment method caf5ef88-98a6-4fe3-b75a-a61db11102d9 for plugin killbill-litle
org.killbill.billing.payment.plugin.api.PaymentPluginApiException: null
    at sun.reflect.GeneratedConstructorAccessor182.newInstance(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_67]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_67]
    at org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:291) ~[na:na]
    at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:104) ~[na:na]
    at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:197) ~[na:na]
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:356) ~[na:na]
    at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:213) ~[na:na]
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:217) ~[na:na]
    at org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:56) ~[na:na]
    at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:211) ~[na:na]
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:217) ~[na:na]
    at org.jruby.RubyClass.newInstance(RubyClass.java:858) ~[na:na]
    at org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen) ~[na:na]
    at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrOneOrTwoOrNBlock.call(JavaMethod.java:314) ~[na:na]
    at org.jruby.java.proxies.ConcreteJavaProxy$3.call(ConcreteJavaProxy.java:155) ~[na:na]
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202) ~[na:na]
    at org.jruby.ast.CallTwoArgNode.interpret(CallTwoArgNode.java:59) ~[na:na]
    at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36) ~[na:na]
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) ~[na:na]
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) ~[na:na]
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) ~[na:na]
    at org.jruby.ast.RescueBodyNode.interpret(RescueBodyNode.java:108) ~[na:na]
    at org.jruby.ast.RescueNode.handleException(RescueNode.java:174) ~[na:na]
    at org.jruby.ast.RescueNode.interpret(RescueNode.java:120) ~[na:na]
    at org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96) ~[na:na]
    at org.jruby.ast.BeginNode.interpret(BeginNode.java:83) ~[na:na]
    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) ~[na:na]
    at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) ~[na:na]
    at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74) ~[na:na]
    at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:112) ~[na:na]
    at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:167) ~[na:na]
    at Killbill$$Plugin$$Api$$PaymentPluginApi_1170753834.addPaymentMethod(Killbill$$Plugin$$Api$$PaymentPluginApi_1170753834.gen:13) ~[na:na]
    at org.killbill.billing.osgi.bundles.jruby.JRubyPaymentPlugin$9.doCall(JRubyPaymentPlugin.java:159) ~[na:na]
    at org.killbill.billing.osgi.bundles.jruby.JRubyPaymentPlugin$9.doCall(JRubyPaymentPlugin.java:156) ~[na:na]
    at org.killbill.billing.osgi.bundles.jruby.JRubyPlugin.callWithRuntimeAndChecking(JRubyPlugin.java:241) ~[na:na]
    at org.killbill.billing.osgi.bundles.jruby.JRubyPaymentPlugin.addPaymentMethod(JRubyPaymentPlugin.java:156) ~[na:na]
    at sun.reflect.GeneratedMethodAccessor186.invoke(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_67]
    at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_67]
    at org.killbill.billing.osgi.ContextClassLoaderHelper$1$1.execute(ContextClassLoaderHelper.java:83) ~[killbill-platform-osgi-0.1.4-SNAPSHOT.jar:na]
    at org.killbill.commons.profiling.Profiling.executeWithProfiling(Profiling.java:37) ~[killbill-concurrent-0.2.32.jar:na]
    at org.killbill.billing.osgi.ContextClassLoaderHelper$1.invoke(ContextClassLoaderHelper.java:80) ~[killbill-platform-osgi-0.1.4-SNAPSHOT.jar:na]
    at com.sun.proxy.$Proxy163.addPaymentMethod(Unknown Source) ~[na:na]
    at org.killbill.billing.payment.core.PaymentMethodProcessor$1.doOperation(PaymentMethodProcessor.java:116) [killbill-payment-0.13.4.jar:na]
    at org.killbill.billing.payment.core.PaymentMethodProcessor$1.doOperation(PaymentMethodProcessor.java:107) [killbill-payment-0.13.4.jar:na]
    at org.killbill.billing.payment.core.ProcessorBase$WithAccountLock.processAccountWithLock(ProcessorBase.java:219) [killbill-payment-0.13.4.jar:na]
    at org.killbill.billing.payment.core.ProcessorBase$CallableWithAccountLock.call(ProcessorBase.java:208) [killbill-payment-0.13.4.jar:na]
    at org.killbill.billing.payment.core.ProcessorBase$CallableWithAccountLock.call(ProcessorBase.java:192) [killbill-payment-0.13.4.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_67]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]

As one can see, the package information isn't that useful. Looking at production logs, the information isn't useful either, so I'm suggesting we make %ex the default. Unfortunately, we're hitting http://jira.qos.ch/browse/LOGBACK-730, so we need to disable the computation of the package data in code (I've created https://github.com/killbill/killbill/issues/288 for tracking).

Before the change, YourKit shows that 40% of the exceptions are ClassNotFoundException and 4% are JumpException$ReturnJump. After the change, JumpException$ReturnJump is now the most thrown exception (35%). ClassNotFoundException only counts for 2%.

While the total number of exceptions thrown is far less after the change, understanding the impact in terms of performance gain requires more work.

kares commented 9 years ago

Do you have an easy way to reproduce it? I'd like to give it a shot.

I've reproduced with bogacs-plus-plus using stripe (but I expect it to happen with litle as well) - been using a custom build for the last time - from https://github.com/kares/killbill-stripe-plugin/tree/load-testing23 + killbill's load-testing13 branch ... so it's not that easy sorry.

but I think I'm finally onto something ... I'm close to confirming the issue was due previous JRuby JIT tunings (mostly -Djruby.compile.fastest=true so you'll need to set that to reproduce)

While switching logging frameworks is easy for a new project, it's a bit more difficult in practice because of production setups (special logback.xml to deal with monitoring, hiding credit card numbers, etc.). That's why I'd like to find and document a workaround for at least logback.

course I do not expect you to switch away ... but I think in the long run it might be beneficial to consider esp. with detailed logging ... async logging using the disruptor library seems to be the new Java logging hotness. for the record to switch I did :

While the total number of exceptions thrown is far less after the change, understanding the impact in terms of performance gain requires more work.

have not run with litle but my primary concern quickly became not seeing exceptions reliably ... this would have early on (at least it seems so now) eliminated the path I went with some added (JRuby) OPTS.

pierre commented 9 years ago

FYI, I've also created https://github.com/killbill/killbill-plugin-framework-ruby/issues/28 (no special JRuby flag).

kares commented 9 years ago

for the record ... there's probably a difference between Tomcat's and Jetty's (web-app) class-loader. Tomcat does not propagate a loadClass runtime exception as a ClassNotFoundException one

thus the issue would only exhibit a throwable being lost with logback/log4j2' extended trace formatting on TC

pierre commented 9 years ago

course I do not expect you to switch away ... but I think in the long run it might be beneficial to consider esp. with detailed logging ... async logging using the disruptor library seems to be the new Java logging hotness

I just came across http://blog.takipi.com/how-to-instantly-improve-your-java-logging-with-7-logback-tweaks/, which gives some numbers regarding the logack async appender (http://logback.qos.ch/manual/appenders.html).

sbrossie commented 9 years ago

(Closed by mistake, so reopened it...)

kares commented 9 years ago

najs logback pimps, thanks @pierre ... but keep in mind async-appenders versus "true" async-loggers are quite different ... the later employed by log4j2 (as fas as I understand) gets out of the way as fast as possible thus even heavy logging has little impact on business logic flow. the drawback for KB might be that for audit purposes it's simply not a "good" choice due its async-queue nature (just a gut guess of mine).

pierre commented 9 years ago

najs logback pimps, thanks @pierre

:smile_cat:

keep in mind async-appenders versus "true" async-loggers are quite different ... the later employed by log4j2 (as fas as I understand) gets out of the way as fast as possible thus even heavy logging has little impact on business logic flow.

Ah, fair enough. I guess I need to read more about it!

the drawback for KB might be that for audit purposes it's simply not a "good" choice due its async-queue nature (just a gut guess of mine).

For compliance purposes, auditing is done actually at the database layer. Each write database operation is wrapped (see EntitySqlDaoWrapperInvocationHandler if you are curious): who did what when is stored in the table itself (created_date, updated_date, created_by and updated_by columns) as well as in an append-only audit_log table, and previous state is stored in a history table associated with the object (e.g. account_history).

The reason we are heavy on logging however is to be able to understand errors and what the system is doing at runtime: because the platform is generic (to build any type of billing and payment infrastructure), the business logic is mostly driven by configuration files, which can become quite complicated depending on your use-case. Logs help the developer understand what configuration lead to a subscription being renewed at a given date for instance. On the payment and payment plugins side of things, we also heavily log responses with the gateways, as these can be flaky and inconsistent (especially legacy ones outside North America).

Long story short, I'm not attached to any particular framework, as long as it does not get (too much) in the way in terms of performance and we can tweak what goes in the logs at runtime. For example, we use the Logback Luhn Mask converter and a custom ThreadNameBasedDiscriminator for log4jdbc-log4j2.

for the record ... there's probably a difference between Tomcat's and Jetty's (web-app) class-loader. Tomcat does not propagate a loadClass runtime exception as a ClassNotFoundException one

Going back to your remark, you seem right. I'm seeing different behaviors on both containers, which makes debugging the whole thing even more complicated...