zalando / logbook

An extensible Java library for HTTP request and response logging
MIT License
1.83k stars 261 forks source link

Possible OOM with large servlet response #1754

Open wintermute0 opened 8 months ago

wintermute0 commented 8 months ago

Hi,

I'm reading the source code and just wondering that why there is no size limit when writing to the branched OutputStream? What if there is a really large HTTP response? Then the whole thing will be stuck inside a ByteArrayOutputStream which will easily use up all your memory. Also there is a configuration property called "logbook.write.max-body-size". You would think that it should be able to put an upper limit to the memory footprint no matter the size of the actual HTTP response body, but looks like it did not.

https://github.com/zalando/logbook/blob/af6f818f4fcc17167f2508abe9eb8f81eb8148b8/logbook-servlet/src/main/java/org/zalando/logbook/servlet/LocalResponse.java#L285

kasmarian commented 8 months ago

You have a point here, as the response is copied in LocalResponse classes, the allocation of memory is doubled. We could use the max-body-size to enforce a limit, let's say, counting the worst case scenario, that 4 bytes are used per character. I'm not sure how big of a risk this is. Who would enable response body logs for cases when it can be that big? Even truncation of the body doesn't make sense here.

wintermute0 commented 8 months ago

Well, here is our story.

We use logbook in a legacy Spring 4.x system and we have a LogFilter for URL pattern "/*". We also have a max-body-size in place. Everything was fine for a couple of years, util a recent deployment. A file downloading functionality was added by one of our team members. And then with some users downloading files > 2GB, our server just exploded :(

I think the real risk is that the "max-body-size" configuration made everybody thought that it should be safe to Filter /*. But actually it's not.

kasmarian commented 7 months ago

After some internal discussions with the team, it doesn't look like we'll be able to address this fully without changing how Logbook works entirely. The problem is that Logbook buffers the requests and responses not only in servlets implementation, but for all the HTTP clients and other server types.

~Another problem is that Logbook predicates evaluate the request/response already after this buffering is made, as the request/response are converted to the internal types at the same time. So you can't exclude the paths, where you send the big payloads from the buffering path, really.~

What you could potentially do for your case for now is just excluding the paths that are using the heavy payload from Logbook (probably you already did that) ~writing your own Filter implementation, where you exclude certain paths before calling Logbook.~

whiskeysierra commented 7 months ago

Are you sure about the predicate already operating on a buffered request? It used to be the case that predicates run before buffering and are therefore cheap. There are defaults to exclude binary content types, iirc.

On Sun, Feb 18, 2024, 12:41 Karen Asmarian @.***> wrote:

After some internal discussions with the team, it doesn't look like we'll be able to address this fully without changing how Logbook works entirely. The problem is that Logbook buffers the requests and responses not only in servlets implementation, but for all the HTTP clients and other server types.

Another problem is that Logbook predicates evaluate the request/response already after this buffering is made, as the request/response are converted to the internal types at the same time. So you can't exclude the paths, where you send the big payloads from the buffering path, really.

What you could potentially do for your case for now is writing your own Filter implementation, where you exclude certain paths before calling Logbook.

— Reply to this email directly, view it on GitHub https://github.com/zalando/logbook/issues/1754#issuecomment-1951212919, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADI7HOYZB5IZEDXJYFPSU3YUHSFDAVCNFSM6AAAAABCSTX462VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGIYTEOJRHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kasmarian commented 7 months ago

Ah, @whiskeysierra is right, the buffering will be used only if the request is eligible for the logging.

Do you think it would help if we mention that logbook.write.max-body-size only affect the size of the output, but does not prevent the buffering of the body?

Another possible suggestion for @wintermute0 I can think of is to define a custom Strategy, if you still want to log response when customers download files, but to not log the body.

As per the defaults to exclude binary content types, as far as I can tell, FilteredHttpResponse still retrieves the body from the org.zalando.logbook.HttpResponse when getBodyAsString is called. So the buffering would still happen, even through the body will not be logged, if the ResponseFilter is configured to skip the body.

wintermute0 commented 7 months ago

I think it would definitely help to mention the buffering behavior in the documentation for logbook.write.max-body-size. As for our particular use case we have already found a way to exclude the download request from logging. Thanks for your help :)

m4rt1nn commented 2 weeks ago

@wintermute0 Can you share the way that you excluded the download request? We have the same problem but get stuck in the property/configuration/bean swamp :-)