Open jpmckinney opened 1 year ago
Blocked by https://github.com/open-contracting/lib-cove-ocds/issues/56 re: item 2 above.
It might consume too much memory. Some packages are extremely large.
Indeed: Colombia files, for example, are a few GBs.
Blocked by https://github.com/open-contracting/lib-cove-ocds/issues/56 re: item 2 above.
In lib-cove-ocds, there's the option to read the file from disk. In that case, ijson can parse iteratively. (Would need to parse twice – once for package data and once for each release, like in file_worker.py
.)
In kingfisher-process, we'd also have to read the file from disk – not from the DB, as I don't think it's possible to stream Edit: Kingfisher Process can read one release at a time from the DB.jsonb
(or bytea
) out of PostgreSQL.
One idea is to check the original packages. This would mean using a new
check
table that links tocollection_file
(instead ofrelease_check
andrecord_check
linking torelease
andrecord
).However: