open-contracting / lib-cove-ocds

A data review library for the Open Contracting Data Standard (OCDS)
Other
0 stars 0 forks source link

Memory performance option: Validate one release at a time #56

Open jpmckinney opened 4 years ago

jpmckinney commented 4 years ago

Presently, the entire package needs to be loaded into memory to be validated. This of course consumes a lot of memory for larger files. https://github.com/open-contracting/lib-cove-oc4ids/issues/23

An alternative is to read the entire input twice: once to re-build the package metadata without releases/records/etc., and then to iteratively yield each release/record for validation.

To avoid rewriting a lot of code, we could perhaps stitch the results for individual releases/records back together, so that errors are still reported as being about releases/0, releases/1, etc. even though each was validated separately.

In any case, this is the only way for memory usage to not scale with input size.

jpmckinney commented 1 year ago

This would reduce memory but not running time. We don't presently have an issue with memory (except in rare cases when someone uploads a huge file to the DRT).

jpmckinney commented 1 year ago

Re-opening as actually we do have an issue with memory (in Kingfisher Process, if we were to attempt to validate packages rather than individual releases/records https://github.com/open-contracting/kingfisher-process/issues/392).