va-big-data-genomics / trellis-mvp-functions

Trellis serverless data management framework for variant calling of VA MVP whole-genome sequencing data.
6 stars 1 forks source link

Quality control data is missing from CloudSQL database #47

Open pbilling opened 1 year ago

pbilling commented 1 year ago

From the Quarto "Selecting GVCFs for Aggregation" page of the MVP Whole Genome Sequencing Data Release 2 book:

Metric: Average per base sequence quality
Has value: 118711
Passed: 118531
Failed: 180
Missing: 9707
Metric: Properly paired mapped reads percentage
Has value: 118350
Passed: 115339
Failed: 3011
Missing: 10068
Metric: Contamination rate
Has value: 128187
Passed: 126882
Failed: 1305
Missing: 231
pbilling commented 1 year ago

Looking at the code for postgres-insert-data function, I had a hunch that if database update operations were failing then the function would error out and a Job node would not be created.

Doesn't seem to be the case though, because I only found 675 instances of this pattern.

MATCH (n:PostgresInsertData:JobRequest)
WHERE NOT (n)-[:TRIGGERED]->(:Job)
RETURN COUNT(n)