phax / as2-lib

A generic Java AS2 library, servlet and server
107 stars 43 forks source link

Retries cause OOM when creating stacktrace #24

Closed undernorthernsky closed 5 years ago

undernorthernsky commented 8 years ago

This is similar to #16; the scenario is repeated failure to send a message.

The final exception isn't too interesting; but anyway:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at com.helger.as2lib.processor.ProcessorException._getMessage(ProcessorException.java:63)
    at com.helger.as2lib.processor.ProcessorException.<init>(ProcessorException.java:74)
    at com.helger.as2lib.processor.DefaultMessageProcessor.handle(DefaultMessageProcessor.java:151)
    at com.helger.as2lib.processor.resender.ImmediateResenderModule.handle(ImmediateResenderModule.java:93)
    at com.helger.as2lib.processor.DefaultMessageProcessor.handle(DefaultMessageProcessor.java:142)
    at com.helger.as2lib.processor.sender.AbstractSenderModule.doResend(AbstractSenderModule.java:148)
    at com.helger.as2lib.processor.sender.AS2SenderModule.handle(AS2SenderModule.java:731)

The problem seems to be the recursive chain of calls: AS2SenderModule -> DefaultMessageProcessor -> ResenderModule -> DefaultMessageProcessor -> AS2SenderModule -> ....

Each time the exception is caught in AS2SenderModule, wrapped, stashed and rethrown if the next attempt (doResend) fails again; then caught again in DefaultMessageProcessor and rethrown as a ProcessorException.

This is the length of getStackAsString() in ProcessorException::_getMessage:

trace length: 11065
trace length: 20294
trace length: 40436
trace length: 80638
trace length: 161126
trace length: 322018
trace length: 643886
trace length: 1287538
trace length: 2574926
trace length: 5149618
trace length: 10299086
trace length: 20597938
trace length: 41195726
trace length: 82391218
trace length: 164782286

I don't see a "good" way to avoid the geometric increase; for our usage the following works, but aborting early might not be correct in the general case.

--- a/as2-lib/src/main/java/com/helger/as2lib/processor/DefaultMessageProcessor.java
+++ b/as2-lib/src/main/java/com/helger/as2lib/processor/DefaultMessageProcessor.java
@@ -141,6 +141,9 @@ public class DefaultMessageProcessor extends AbstractMessageProcessor
           bModuleFound = true;
           aModule.handle (sAction, aMsg, aOptions);
         }
+        catch (final ProcessorException ex) {
+          throw ex;
+        }
         catch (final OpenAS2Exception ex)
         {
           aCauses.add (ex);
phax commented 8 years ago

And if you don't run in an OOM, you will sooner or later run in an StackOverflowException. What I could do, is to create an "AsyncMessageProcessor" that handles the messages asynchronously (so dispatching happens in a separate thread). This would avoid the stack overflow and the OOM because the stack would always be fixed sized. I would do this for the current HEAD which targets JDK 8. Does that sound reasonable to you?

undernorthernsky commented 8 years ago

Yes, switching to async would certainly solve the issue. Though that would require some kind of persistence to avoid dropping messages when the application shuts down? Which effectively duplicates DirectoryResender?

I am not sure if this issue is really relevant for most users; the core problem is "sync" resending, which might be an edge case.

The way we are currently using the lib requires sync (re-)sending; so for the moment we are going forward with the modified MessageProcessor.

So maybe you don't need to "fix" anything; possibly document the problem w.r.t. ImmediateResender (or even deprecate it). I mainly wanted some feedback, feel free to close as wontfix :-)

phax commented 8 years ago

Well, I don't think it requires a persistence. If you're currently shutting down your application and there is something to resend, it might also be lost (e.g. when run in a daemon thread).

If there is an async message processor it simply avoids stacking the stacktraces but it also does not throw the exceptions directly. You may wanna have a look at the AsyncMessageProcessor class I just committed.

undernorthernsky commented 8 years ago

I'll play with it next week. Just from reading there seem to be a few problems:

In my mind pushing mutable object (aMsg, aOptions) to another thread and returning early is questionable; for example calling aMsg.getMDN() after sending won't work reliably even for sync MDNs.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.