populationgenomics / joint-calling

Sample and variant QC, based on https://github.com/broadinstitute/gnomad_qc
MIT License
1 stars 2 forks source link

Upload Processor #11

Open vivbak opened 3 years ago

vivbak commented 3 years ago

Background

Sequencing providers, using a service account, will upload a range of files into the gs://cpg-#STACK-upload bucket. These files include CRAM files, gVCF files, etc. * These files will need to be processed into appropriate buckets for further downstream analysis & archival storage.

WIP: https://lucid.app/lucidchart/invitations/accept/8f56b7e5-6be5-45f2-a2fc-518d48ce23ab

Functional Requirements

The upload processor pipeline should:

[Outdated] 2nd March

Update 3rd March

Inputs: $STACK Airtable Table QC Outputs & Exit Status**

Trigger: Run within a batch workflow, manually triggered.

Current Questions: *Confirmation of all of the input files + organization. I.e. folder per sample? **Exploration into how QC outputs will impact the upload processor pipeline. How should that information feedback in?

vivbak commented 3 years ago

Current thinking:

Potential Pre-reqs: Manifest status to have 3 additional options - 'processed', 're-process', 'processing'

Input:

Output: {mt} MatrixTable for new samples to be added to csv CSV file of new samples to be added/updated

Steps (pre-QC)

Steps (Post-QC)

WIP Data Model https://lucid.app/lucidchart/invitations/accept/f0f24fd0-44d0-43a3-9119-9d5c68b1631e