Open kares opened 9 years ago
One thought on this: could we use conversion words like %ex
and/or %rEx
to work-around the issue (see http://logback.qos.ch/manual/layouts.html)?
might work (depends on how it routes the throwable formatting code ... assuming it does not do any of the %xEx
class resolving) - mostly tested the whole thing with log4j2 as I got quite paranoid that both of the logging frameworks have this at their core and do this implicitly ...
p.s. with e.printStackTrace
at the suspicious place run out of disk space in 4h - log was above 8GB
might work (depends on how it routes the throwable formatting code ... assuming it does not do any of the
%xEx
class resolving)
Do you have an easy way to reproduce it? I'd like to give it a shot.
mostly tested the whole thing with log4j2 as I got quite paranoid that both of the logging frameworks have this at their core and do this implicitly ...
While switching logging frameworks is easy for a new project, it's a bit more difficult in practice because of production setups (special logback.xml
to deal with monitoring, hiding credit card numbers, etc.). That's why I'd like to find and document a workaround for at least logback.
p.s. with
e.printStackTrace
at the suspicious place run out of disk space in 4h - log was above 8GB
So, it seems that it's really a JVM issue and as long as the logging framework doesn't unwind the stack, we should be good? Maybe we can configure it to not try to print the whole thing (e.g. %ex{100}
for logback), if the exception is thrown from org.jruby.internal.runtime.methods.JittedMethod
and/or simply org.killbill.billing.osgi.bundles.jruby.JRubyPlugin
.
I've managed to take a look at the exception handling by running the Litle plugin and trying to add a credit card which isn't valid in Litle (4111111111111111
).
Here is the default stacktrace I'm getting in the add payment method call:
2015-03-06 12:10:34,507 [Thread-11] WARN o.k.b.k.0.1.4.SNAPSHOT - [litle-plugin] Unsuccessful add_payment_method: Authorization amount has already been depleted
2015-03-06 12:10:34,705 [Thread-11] WARN o.k.b.k.0.1.4.SNAPSHOT - [litle-plugin] Failure in add_payment_method: Authorization amount has already been depleted
/var/tmp/bundles/plugins/ruby/killbill-litle/1.10.0/gems/gems/killbill-3.2.2/lib/killbill/helpers/active_merchant/payment_plugin.rb:176:in `add_payment_method'
/var/tmp/bundles/plugins/ruby/killbill-litle/1.10.0/gems/gems/killbill-litle-1.10.0/lib/litle/api.rb:90:in `add_payment_method'
/var/tmp/bundles/plugins/ruby/killbill-litle/1.10.0/gems/gems/killbill-3.2.2/lib/killbill/gen/plugin-api/payment_plugin_api.rb:442:in `addPaymentMethod'
Killbill$$Plugin$$Api$$PaymentPluginApi_1170753834.gen:13:in `addPaymentMethod'
2015-03-06 12:10:34,740 [Plugin-th-120] WARN o.k.b.p.core.PaymentMethodProcessor - Error adding payment method caf5ef88-98a6-4fe3-b75a-a61db11102d9 for plugin killbill-litle
org.killbill.billing.payment.plugin.api.PaymentPluginApiException: null
at sun.reflect.GeneratedConstructorAccessor182.newInstance(Unknown Source) ~[na:na]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_67]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_67]
at org.jruby.javasupport.JavaConstructor.newInstanceDirect(JavaConstructor.java:291) ~[na:na]
at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:104) ~[na:na]
at org.jruby.java.invokers.ConstructorInvoker.call(ConstructorInvoker.java:197) ~[na:na]
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:356) ~[na:na]
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:213) ~[na:na]
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:217) ~[na:na]
at org.jruby.java.proxies.ConcreteJavaProxy$2.call(ConcreteJavaProxy.java:56) ~[na:na]
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:211) ~[na:na]
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:217) ~[na:na]
at org.jruby.RubyClass.newInstance(RubyClass.java:858) ~[na:na]
at org.jruby.RubyClass$INVOKER$i$newInstance.call(RubyClass$INVOKER$i$newInstance.gen) ~[na:na]
at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrOneOrTwoOrNBlock.call(JavaMethod.java:314) ~[na:na]
at org.jruby.java.proxies.ConcreteJavaProxy$3.call(ConcreteJavaProxy.java:155) ~[na:na]
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202) ~[na:na]
at org.jruby.ast.CallTwoArgNode.interpret(CallTwoArgNode.java:59) ~[na:na]
at org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36) ~[na:na]
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) ~[na:na]
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) ~[na:na]
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) ~[na:na]
at org.jruby.ast.RescueBodyNode.interpret(RescueBodyNode.java:108) ~[na:na]
at org.jruby.ast.RescueNode.handleException(RescueNode.java:174) ~[na:na]
at org.jruby.ast.RescueNode.interpret(RescueNode.java:120) ~[na:na]
at org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96) ~[na:na]
at org.jruby.ast.BeginNode.interpret(BeginNode.java:83) ~[na:na]
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) ~[na:na]
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) ~[na:na]
at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74) ~[na:na]
at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:112) ~[na:na]
at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:167) ~[na:na]
at Killbill$$Plugin$$Api$$PaymentPluginApi_1170753834.addPaymentMethod(Killbill$$Plugin$$Api$$PaymentPluginApi_1170753834.gen:13) ~[na:na]
at org.killbill.billing.osgi.bundles.jruby.JRubyPaymentPlugin$9.doCall(JRubyPaymentPlugin.java:159) ~[na:na]
at org.killbill.billing.osgi.bundles.jruby.JRubyPaymentPlugin$9.doCall(JRubyPaymentPlugin.java:156) ~[na:na]
at org.killbill.billing.osgi.bundles.jruby.JRubyPlugin.callWithRuntimeAndChecking(JRubyPlugin.java:241) ~[na:na]
at org.killbill.billing.osgi.bundles.jruby.JRubyPaymentPlugin.addPaymentMethod(JRubyPaymentPlugin.java:156) ~[na:na]
at sun.reflect.GeneratedMethodAccessor186.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_67]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_67]
at org.killbill.billing.osgi.ContextClassLoaderHelper$1$1.execute(ContextClassLoaderHelper.java:83) ~[killbill-platform-osgi-0.1.4-SNAPSHOT.jar:na]
at org.killbill.commons.profiling.Profiling.executeWithProfiling(Profiling.java:37) ~[killbill-concurrent-0.2.32.jar:na]
at org.killbill.billing.osgi.ContextClassLoaderHelper$1.invoke(ContextClassLoaderHelper.java:80) ~[killbill-platform-osgi-0.1.4-SNAPSHOT.jar:na]
at com.sun.proxy.$Proxy163.addPaymentMethod(Unknown Source) ~[na:na]
at org.killbill.billing.payment.core.PaymentMethodProcessor$1.doOperation(PaymentMethodProcessor.java:116) [killbill-payment-0.13.4.jar:na]
at org.killbill.billing.payment.core.PaymentMethodProcessor$1.doOperation(PaymentMethodProcessor.java:107) [killbill-payment-0.13.4.jar:na]
at org.killbill.billing.payment.core.ProcessorBase$WithAccountLock.processAccountWithLock(ProcessorBase.java:219) [killbill-payment-0.13.4.jar:na]
at org.killbill.billing.payment.core.ProcessorBase$CallableWithAccountLock.call(ProcessorBase.java:208) [killbill-payment-0.13.4.jar:na]
at org.killbill.billing.payment.core.ProcessorBase$CallableWithAccountLock.call(ProcessorBase.java:192) [killbill-payment-0.13.4.jar:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_67]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
As one can see, the package information isn't that useful. Looking at production logs, the information isn't useful either, so I'm suggesting we make %ex
the default.
Unfortunately, we're hitting http://jira.qos.ch/browse/LOGBACK-730, so we need to disable the computation of the package data in code (I've created https://github.com/killbill/killbill/issues/288 for tracking).
Before the change, YourKit shows that 40% of the exceptions are ClassNotFoundException
and 4% are JumpException$ReturnJump
. After the change, JumpException$ReturnJump
is now the most thrown exception (35%). ClassNotFoundException
only counts for 2%.
While the total number of exceptions thrown is far less after the change, understanding the impact in terms of performance gain requires more work.
Do you have an easy way to reproduce it? I'd like to give it a shot.
I've reproduced with bogacs-plus-plus using stripe (but I expect it to happen with litle as well) - been using a custom build for the last time - from https://github.com/kares/killbill-stripe-plugin/tree/load-testing23 + killbill's load-testing13 branch ... so it's not that easy sorry.
but I think I'm finally onto something ... I'm close to confirming the issue was due previous JRuby JIT tunings (mostly -Djruby.compile.fastest=true
so you'll need to set that to reproduce)
While switching logging frameworks is easy for a new project, it's a bit more difficult in practice because of production setups (special logback.xml to deal with monitoring, hiding credit card numbers, etc.). That's why I'd like to find and document a workaround for at least logback.
course I do not expect you to switch away ... but I think in the long run it might be beneficial to consider esp. with detailed logging ... async logging using the disruptor library seems to be the new Java logging hotness. for the record to switch I did :
While the total number of exceptions thrown is far less after the change, understanding the impact in terms of performance gain requires more work.
have not run with litle but my primary concern quickly became not seeing exceptions reliably ... this would have early on (at least it seems so now) eliminated the path I went with some added (JRuby) OPTS.
FYI, I've also created https://github.com/killbill/killbill-plugin-framework-ruby/issues/28 (no special JRuby flag).
for the record ... there's probably a difference between Tomcat's and Jetty's (web-app) class-loader. Tomcat does not propagate a loadClass
runtime exception as a ClassNotFoundException
one
thus the issue would only exhibit a throwable being lost with logback/log4j2' extended trace formatting on TC
course I do not expect you to switch away ... but I think in the long run it might be beneficial to consider esp. with detailed logging ... async logging using the disruptor library seems to be the new Java logging hotness
I just came across http://blog.takipi.com/how-to-instantly-improve-your-java-logging-with-7-logback-tweaks/, which gives some numbers regarding the logack async appender (http://logback.qos.ch/manual/appenders.html).
(Closed by mistake, so reopened it...)
najs logback pimps, thanks @pierre ... but keep in mind async-appenders versus "true" async-loggers are quite different ... the later employed by log4j2 (as fas as I understand) gets out of the way as fast as possible thus even heavy logging has little impact on business logic flow. the drawback for KB might be that for audit purposes it's simply not a "good" choice due its async-queue nature (just a gut guess of mine).
najs logback pimps, thanks @pierre
:smile_cat:
keep in mind async-appenders versus "true" async-loggers are quite different ... the later employed by log4j2 (as fas as I understand) gets out of the way as fast as possible thus even heavy logging has little impact on business logic flow.
Ah, fair enough. I guess I need to read more about it!
the drawback for KB might be that for audit purposes it's simply not a "good" choice due its async-queue nature (just a gut guess of mine).
For compliance purposes, auditing is done actually at the database layer. Each write database operation is wrapped (see EntitySqlDaoWrapperInvocationHandler if you are curious): who did what when is stored in the table itself (created_date
, updated_date
, created_by
and updated_by
columns) as well as in an append-only audit_log
table, and previous state is stored in a history table associated with the object (e.g. account_history
).
The reason we are heavy on logging however is to be able to understand errors and what the system is doing at runtime: because the platform is generic (to build any type of billing and payment infrastructure), the business logic is mostly driven by configuration files, which can become quite complicated depending on your use-case. Logs help the developer understand what configuration lead to a subscription being renewed at a given date for instance. On the payment and payment plugins side of things, we also heavily log responses with the gateways, as these can be flaky and inconsistent (especially legacy ones outside North America).
Long story short, I'm not attached to any particular framework, as long as it does not get (too much) in the way in terms of performance and we can tweak what goes in the logs at runtime. For example, we use the Logback Luhn Mask converter and a custom ThreadNameBasedDiscriminator for log4jdbc-log4j2.
for the record ... there's probably a difference between Tomcat's and Jetty's (web-app) class-loader. Tomcat does not propagate a loadClass runtime exception as a ClassNotFoundException one
Going back to your remark, you seem right. I'm seeing different behaviors on both containers, which makes debugging the whole thing even more complicated...
related to "occasional" traces seen as :
some full traces at https://gist.github.com/anonymous/aea6b5a96e05bd2531dc
this seems to be one of the factors for unpredictable behaviour as well as likely a major source of "unreliability" for doing any performance optimizations that involve integrating with JRuby ...
whenever a
Throwable
is processed by logback it will try to analyse classes involved in stack-trace elements thus trying to load each class based on it's name, whenever a JRuby internal JIT-ed class name is there it will lead to an exception such as above - under logback effectively loosing the original exception.on logback this seems to end up in a considerable slow-down (that only gets triggered after JRuby warms-up) https://gist.github.com/kares/e914ce2afae287a6059d#file-logback_with_yjp_agent-out
similar behaviour happens with log4j2 as it tries to analyze the stack-trace elements: https://gist.github.com/kares/e914ce2afae287a6059d#file-log4j2-log
WiP but in case you have any feedback please let me know as at this point my paranoia is still growing :)