spdx / spdx-java-tagvalue-store

SPDX Document Storage using the Tag/Value format
Apache License 2.0
2 stars 1 forks source link

Performance issue with a large number of files #7

Closed goneall closed 3 years ago

goneall commented 3 years ago

With a large number of files, there is 6 ms per file on a relatively high performance machine. For a file with 145K files, this can add up to more than 1 minute 40 seconds.

From profiling, this is primarily due to the call to modelStore.delete(documentNamespace, lastFile.getId()); in BuildDocument.addLastFile().

The delete function is extremely slow - see issue https://github.com/spdx/Spdx-Java-Library/issues/40

This is one of the causes for issue https://github.com/spdx/spdx-online-tools/issues/289

goneall commented 3 years ago

One possible design change is to use a separate model store for the temporary files used. Then, rather than calling delete, the entire memory store can be disposed of when the entire document has been parsed.

goneall commented 3 years ago

The actual memory penalty for not deleting these files should be less than 100 bytes per file. A short term workaround may be to just comment out the delete until a longer term solution can be built.

goneall commented 3 years ago

The actual memory penalty for not deleting these files should be less than 100 bytes per file. A short term workaround may be to just comment out the delete until a longer term solution can be built.

OK - this didn't work. The verification algorithm checks to make sure there are no anonymous files.

goneall commented 3 years ago

One possible design change is to use a separate model store for the temporary files used.

This didn't work either - since the package is being copied from a different model store, the file is getting created with a different ID. See the branch issue7 unit test failures for details. The additional copy of the SPDX file is created due to the check in the following code:

https://github.com/spdx/Spdx-Java-Library/blob/b9b04052d8d9776e5eea4290819243e95788503d/src/main/java/org/spdx/library/ModelCopyManager.java#L182

goneall commented 3 years ago

This as been resolved in the underlying library - see spdx/Spdx-Java-Library#40