Users should be able to retrieve the subset of sample records identified by a query. This is similar to the reliquary implementation, but for full records.
The workflow may be something like:
User identifies a subset of records through the search interface
User initiates download of the records, specifying query and download format
App creates a temporary space for the results, starts the retrieval, and returns an id to the user that may be used to subsequently download
When complete, the results may be retrieved by the user by specifying the previously returned id
If the download is not performed within some time period (a day?) the results are deleted
Results are deleted after download (may be difficult to determine when download is done, so perhaps just leave and remove on the same schedule as 5. above).
The download action is a significant action that should be recorded in metrics.
Considerations:
There may be multiple users requesting downloads at the same time so need to avoid race conditions
There should be an upper bound on the number of requests being serviced at a time
There should be a limit on the number of requests allowed per user (implies authentication for download)
There may be a choice of download formats. Json lines (new line delimited json records) is a convenient format for initial support. CSV, sqlite and parquet are also likely candidates.
Users should be able to retrieve the subset of sample records identified by a query. This is similar to the reliquary implementation, but for full records.
The workflow may be something like:
Considerations: