molgenis / molgenis-service-armadillo

Armadillo; a DataSHIELD implementation, part of the MOLGENIS suite
https://molgenis.github.io/molgenis-service-armadillo/
GNU Lesser General Public License v3.0
7 stars 10 forks source link

When saving a Workspace, complete workspace is kept in memory to check its size #778

Open marikaris opened 2 months ago

marikaris commented 2 months ago

How to Reproduce

When saving workspace it's turned into an ArmadilloWorkspace class, loading the workspace as a bytearray to determine its size: https://github.com/molgenis/molgenis-service-armadillo/blob/4c12f87a030f3b18d66f320344f538395dac2844/armadillo/src/main/java/org/molgenis/armadillo/storage/ArmadilloWorkspace.java#L13

Proposed solution

See if we can run this twice: https://github.com/molgenis/molgenis-service-armadillo/blob/4c12f87a030f3b18d66f320344f538395dac2844/armadillo/src/main/java/org/molgenis/armadillo/command/impl/CommandsImpl.java#L202 First time to check the size of the workspace in a streaming way, the second time to actually save it (the name rExecutorService.saveWorkspace is a bit confusing, this method does not save yet, but calls a callback that can do something with the inputstream, so it could also check the size instead of saving it) This is a bit annoying because it potentially will slow things. Check how much this affects usability.