open-eid / digidoc4j

DigiDoc for Java. Javadoc:
http://open-eid.github.io/digidoc4j
GNU Lesser General Public License v2.1
72 stars 40 forks source link

Out Of Memory when loading large container from stream #90

Closed diidiiman closed 3 years ago

diidiiman commented 3 years ago

Hi!

Currently we are migrating the document validation logic to AWS Lambda and everything is working perfectly when we are dealing with relatively small documents 10-20MB. When we try to stream through this lambda S3 object of 400MB to DigiDoc4j Container for validation purposes, we hit OOM error.

My understanding is that "Container" loads whole document in memory and then operates with it.

Is there some other way how to validate a document to avoid the said issue?

If that is intended behavior, then I was wondering is there a possibility to get the validation done only by signatures and file digests? Thus eliminating the necessity for the files to be needlessly loaded in memory?

Thank you in advance for any pointers or ideas!

naare commented 3 years ago

Digidoc4j is built on DSS library which reads all the content into memory. Currently it is not possible avoid that in the typical use case.

You can however, validate the XAdES signature in detached form: https://github.com/open-eid/digidoc4j/wiki/Examples-of-using-it#detached-xades-containerless-signature-handling

You will need to extract the signature files and datafiles from the zip container with other means than Digidoc4j. Keep in mind that this is pure signature validation (no checks are made on container structure as there is no container). Also it is possible to validate only LT (Long term) signatures, there is no support for LTA (long term archival) level.

diidiiman commented 3 years ago

Thanks for clarifying this. This approach worked!