Closed PeopleMakeCulture closed 9 months ago
@aclum We have started developing this endpoint and can currently return the results of one aggregate command. That means it can return results of up to 16MB. Would it be helpful for you to have access to this interim stage queries:run
endpoint for now, as we build out the paging functionality?
Link to relevant PR: https://github.com/microbiomedata/nmdc-runtime/compare/422-add-aggregation-command
so, it turns out that cursors for aggregate
commands don't persist either -- my best guess at why it worked for me via pymongo in a python shell session is that pymongo still starts an implicit session for commands, even though I thought that was discontinued.
The approach I think we'll take for this now is:
1) append an $out
stage to the user-supplied aggregation pipeline, to send results to a temporary mongo collection.
2) call nmdc_runtime.api.endpoints.util.find_resources
to use our custom cursor functionality that is currently in service for the find
endpoints, so that one can retrieve all aggregation results if they exceed 16MB (the mongodb bson document size limit).
3) ensure the temporary collection is cleaned up (e.g. via a dagster schedule)
Yes, that would be useful.
New ticket to extend aggregate query:run with paging: https://github.com/microbiomedata/nmdc-runtime/issues/460
Split from: https://github.com/microbiomedata/issues/issues/496
From @aclum:
From @dwinston: