icgc-argo / platform-api

https://api.platform.icgc-argo.org/graphql
GNU Affero General Public License v3.0
0 stars 0 forks source link

Clinical Download for All Donors - Donor Download endpoint by Arranger Filter #699

Open joneubank opened 8 months ago

joneubank commented 8 months ago

Detailed Description

We want the ability to download clinical data for all donors in a File Repository query (including a query with no filters). To accomplish this, we want to add an endpoint to the gateway where a SQON filter for the file repository can be provided and the gateway will return the TSV download for all donors included in that filter.

Possible Implementation

This endpoint can be a GET request that takes the SQON as a query parameter. If no SQON is provided, we will use a default case of "all files" (no filter).

The handler should use this query to get the list of unique Donor IDs from the ES file centric index. It is important that this query apply the serverSide filters we have on all arranger requests that will filter the results based on the user permissions and the file embargo stage meta-data. With the list of donors retrieved, the donor data can be retrieved from the clinical service.

Considerations for large queries

Since the number of files will likely be in the tens or hundreds of thousands, we should instead be retrieving the donor ID aggregation unique values. This will work well up until we press against the ES max buckets limit (around 65k). A composite aggregation should allow streaming all unique donor IDs for the filter. A limit to the max donors in the request may be needed as the total number of ARGO donors increases.