levigo / jbig2-imageio

A Java ImageIO plugin for the JBIG2 bi-level image format
Apache License 2.0
31 stars 19 forks source link

Memory Leak in SoftReferenceCache #53

Closed justinfalk closed 3 years ago

justinfalk commented 6 years ago

Hello,

I noticed a memory leak in SoftReferenceCache. The cache is implemented as a HashMap where the key is an arbitrary object and the value is wrapped with a SoftReference. While the values eventually get reclaimed by the garbage collector, there is no mechanism to ever clear the keys of the HashMap. Thus, the keys of the cache grow without bound and may eventually exhaust available heap space.

JBIG2ImageReader uses this cache with a JBIG2Page as the key. The JBIG2Page objects occupy about 20kB worth of memory because they contain a 20k javax.imageio.stream.MemoryCacheImageInputStream. This can accumulate quickly when rendering pdfs with jbig2 (50k images ~ 1GB heap space).

As a workaround, I'm periodically calling cache.clear().

Suggestions: 1) An option to disable this cache without having to override via ServiceLoader. 2) Implement a size or time-based eviction strategy to purge old cached entries.

justinfalk commented 6 years ago

screen shot 2017-10-26 at 2 32 39 pm screen shot 2017-10-26 at 2 33 18 pm

hennejg commented 6 years ago

Sorry for the delay in dealing with this issue. We are currently in the process of making this project part of the Apache PDFBox project.

The product for which we originally implemented the decoder injects its own cache and uses this integration to tie the life-cycle of objects in the cache to the life-cycle in the upstream documents. To do this, it uses the service loader mechanism you already found. This nonwithstanding, you are certainly right that the way the cache is implemented, isn't ideal, to put it mildly. The code's intention was to tie the cache contents to a reader instance - as long as you had this reader instance around, the cache would hold on to the cached objects using soft references. However, the hard refs to the keys completely destroy the intended semantics. I need to discuss this issue with a few colleagues, but as I see it, using weak refs for the keys should do the trick.

justinfalk commented 6 years ago

Thanks for the update @hennejg .

Also wanted to mention that the current SoftReferenceCache implementation can lead to an infinite loop because it uses an unsynchronized HashMap yet can be accessed by multiple threads. See http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html

If the intention of this cache was to be tied to a reader instance, then it probably wasn't necessary to synchronize at the time, but based on current usage this is problematic. It's fairly easy to reproduce the infinite loop for my project which is highly concurrent. As a quick fix, you could just make it a ConcurrentHashMap instead of HashMap.