mjul / docjure

Read and write Office documents from Clojure
MIT License
619 stars 129 forks source link

Working arond "Zip bomb detected!" #113

Closed aphyr closed 5 months ago

aphyr commented 5 months ago

When opening a spreadsheet with a large number of redundant columns (like the truly pathological .xlsx files generated by the AWS Marketplace's product load form), the Apache zip library will throw:

ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
Execution error (IOException) at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream/checkThreshold (ZipArchiveThresholdInputStream.java:143).
Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.
Uncompressed size: 131128, Raw/compressed size: 1293, ratio: 0.009861
Limits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/styles.xml
  {:type java.io.IOException,
    :message
    "Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.\nThis may indicate that the file is used to inflate memory usage and thus could pose a security risk.\nYou can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.\nUncompressed size: 131128, Raw/compressed size: 1293, ratio: 0.009861\nLimits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/styles.xml",
    :at
    [org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream
     checkThreshold
     "ZipArchiveThresholdInputStream.java"
     143]}],
  :trace
  [[org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream
    checkThreshold
    "ZipArchiveThresholdInputStream.java"
    143]
   [org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream
    read
    "ZipArchiveThresholdInputStream.java"
    82]
   [org.apache.poi.util.IOUtils toByteArray "IOUtils.java" 240]
   [org.apache.poi.util.IOUtils toByteArray "IOUtils.java" 203]
   [org.apache.poi.openxml4j.util.ZipArchiveFakeEntry
    <init>
    "ZipArchiveFakeEntry.java"
    82]
   [org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource
    <init>
    "ZipInputStreamZipEntrySource.java"
    98]
   [org.apache.poi.openxml4j.opc.ZipPackage
    <init>
    "ZipPackage.java"
    132]
   [org.apache.poi.openxml4j.opc.OPCPackage open "OPCPackage.java" 319]
   [org.apache.poi.xssf.usermodel.XSSFWorkbookFactory
    create
    "XSSFWorkbookFactory.java"
    97]
   [org.apache.poi.xssf.usermodel.XSSFWorkbookFactory
    create
    "XSSFWorkbookFactory.java"
    36]
   [org.apache.poi.ss.usermodel.WorkbookFactory
    lambda$create$2
    "WorkbookFactory.java"
    224]
   [org.apache.poi.ss.usermodel.WorkbookFactory
    wp
    "WorkbookFactory.java"
    329]
   [org.apache.poi.ss.usermodel.WorkbookFactory
    create
    "WorkbookFactory.java"
    224]
   [org.apache.poi.ss.usermodel.WorkbookFactory
    create
    "WorkbookFactory.java"
    185]
   [dk.ative.docjure.spreadsheet$load_workbook_from_stream
    invokeStatic
    "spreadsheet.clj"
    83]
   [dk.ative.docjure.spreadsheet$load_workbook_from_stream
    invoke
    "spreadsheet.clj"
    79]
   [dk.ative.docjure.spreadsheet$load_workbook_from_file
    invokeStatic
    "spreadsheet.clj"
    90]
   [dk.ative.docjure.spreadsheet$load_workbook_from_file
    invoke
    "spreadsheet.clj"
    86]
   [dk.ative.docjure.spreadsheet$fn__1533
    invokeStatic
    "spreadsheet.clj"
    106]
   [dk.ative.docjure.spreadsheet$fn__1533 invoke "spreadsheet.clj" 104]

I've been poking around in the code a bit, but I can't figure out how to actually thread ZipSecureFile.setMinInflateRatio() down through Docjure's API. If you have any ideas, I'm all ears! Here's an example file that'll reproduce the error.

AWSMP-a08d0cd4-bb9b-4d72-b7f3-a77c75f309e9-2024-02-14 05-59-33.xlsx

aphyr commented 5 months ago

Wow, okay, I'm a dumbass. I figured there was no way this was a static method, but no, it is static! You just set it globally!