icgc-argo / workflow-roadmap

Roadmap and management for genomic data processing
GNU Affero General Public License v3.0
1 stars 0 forks source link

DR9 - QC ~1400 MUTO-INTL/POG-CA Donors #429

Closed edsu7 closed 5 months ago

edsu7 commented 8 months ago

@edsu7 can you take a look at the following QC criteria?

Processed genomic data release criteria:

analysis workflow QC criteria analysisType file access
dna alignment 1. must have mutect2 or sanger called successfully
2. normal coverage > 25X, tumour coverage > 30X (use column normal/tumour_estimated_coverage)
3. donors with multiple tumour/normal pairs must have all samples processed
squencing_alignment, qc_metrics controlled
sanger variant calling 1. exclude ASCAT failed donors
variant_calling, qc_metrics controlled
mutect2 variant calling 1. cross_sample_contamination <4% (use column normal/tumour_mutect2_contamination) variant_calling, qc_metrics controlled
edsu7 commented 6 months ago

To suppress:

edsu7 commented 6 months ago

Current summary of eligible donors based on QC metrics

DR9_summary_report.xls

Pinging @lindaxiang and @guanqiaofeng for review

lindaxiang commented 6 months ago

@edsu7 I found the follow analysis which may need attention: MUTO-INTL

studyId donorId workflow_name old_run_id old_run_complete_date new_run_id new_run_complete_date
MUTO-INTL DO250007 sanger-wgs-variant-calling wes-74c85dc74824415193591434f7fc6d18 2021-07-25 12:26 wes-c96e46f774824840ba815ba18e6059c2 2024-05-11 18:34
MUTO-INTL DO250009 sanger-wgs-variant-calling wes-69487332b04f4763b8194ecdc23c05e4 2021-07-25 11:29 wes-6d357325de134e88b4c963ac986c3a66 2024-05-11 15:18
MUTO-INTL DO250003 sanger-wgs-variant-calling wes-41510444576241f195b85973d8af0792 2021-07-25 0:46 wes-7d164f0f9ff841b2b7dabf7eb868338c 2024-05-11 13:32
MUTO-INTL DO250008 sanger-wgs-variant-calling wes-6c2b5093f56f473b8fb546387ef84eff 2021-07-25 7:29 wes-92c4cd3cb7c54990b04555bcd5054b25 2024-05-11 13:40
MUTO-INTL DO250006 sanger-wgs-variant-calling wes-c34a3e0b13e64e36a79be7f5b66ef4e2 2021-07-25 18:28 wes-292cc6cbb7554247aebd4d40e8c149e0 2024-05-12 1:36
MUTO-INTL DO250002 sanger-wgs-variant-calling wes-f2436f8b10a4472a9f246bc3fea6aa8d 2021-07-25 8:42 wes-6797dff1ac3a4fb081713b5bb6dfb3ed 2024-05-12 4:23
MUTO-INTL DO250005 sanger-wgs-variant-calling wes-3845044ef62e45a3aeeaa02b1bdb8406 2021-07-25 0:35 wes-d73fe47da7444f3997cf770351b1dccf 2024-05-12 5:10
MUTO-INTL DO250004 sanger-wgs-variant-calling wes-d0c6f25b417740feadff72df32b18dc5 2021-07-25 6:56 wes-16d50d9a3d9f49c3adbf5215e99b4666 2024-05-11 12:41

POG-CA

To suppress:

  • analysisIds associated with wes-a4df7a50b59e4b7cba4225475db52c30

    • DO250000 already has run with open access calls wes-79a9299f0c8444f59ec1843bb57730d4
12567dd8-0721-4e94-967d-d807219e9418
86990ee0-c317-4b63-990e-e0c3174b63be
7669dda5-f4f6-44a2-a9dd-a5f4f6a4a2c9
58b3f5c2-5b5f-42f6-b3f5-c25b5f92f60e
a8ac1de3-5ba0-4664-ac1d-e35ba04664ac
437866d4-21a8-4ad3-b866-d421a88ad3ab
  • analysisIds associated with wes-e8fdc169af9b4e10a9201706c71a4de2

    • DO252093 already has run with calls wes-63271bda2dee4194a1e3f5c0a3d3cf37
760de433-b4bd-4e30-8de4-33b4bd1e308d
f09eca25-143d-4b4f-9eca-25143d3b4f66
fe25cad2-5aff-444d-a5ca-d25aff544d69
edsu7 commented 6 months ago

Update:

The following 8 donors have generated sanger variant calling results in 2021, but they were recently re-running the sanger workflows. So the recent new runs' results will need to be suppressed.

  • graphql query did not populate properly as a result these analyses were rerun. image.png
  • Analyses affiliated with runs to be suppressed Donor DO2500001: analysisIds associated with wes-a4df7a50b59e4b7cba4225475db52c30 are all already in UNPUBLISHED state
  • unpublished to not interfere with QC Donor: DO252093 already run with calls wes-63271bda2dee4194a1e3f5c0a3d3cf37, however one of the - inputs was suppressed for some reason: 56e6afad-d00f-4dbe-a6af-add00f4dbeb4. Do we know the reason?
  • run wes-e8fdc169af9b4e10a9201706c71a4de2 used tumour alignment 8c2a20cb-b133-4c66-aa20-cbb133ec660b
  • run wes-63271bda2dee4194a1e3f5c0a3d3cf37 used tumour alignment 56e6afad-d00f-4dbe-a6af-add00f4dbeb4
  • 56e6afad-d00f-4dbe-a6af-add00f4dbeb4 was unpublished as a duplicate 8c2a20cb-b133-4c66-aa20-cbb133ec660b was kept
  • analyses associated with wes-63271bda2dee4194a1e3f5c0a3d3cf37 should be unpublished and those associated with wes-e8fdc169af9b4e10a9201706c71a4de2 should be republished Donor: DO252253 has an alignment run wes-f63059947de9492689d582a805797317 whose input analysis 6f8c8772-a69d-4c26-8c87-72a69dbc262e is UNPUBLISHED. Do we know the reason why the input was suppressed? We will need to suppress it's output analysis 29830a95-8882-48e5-830a-95888288e5c5
  • run wes-f63059947de9492689d582a805797317 had failed at QC generation step but alignment went through 29830a95-8882-48e5-830a-95888288e5c5
  • sample was not rerun b/c source 6f8c8772-a69d-4c26-8c87-72a69dbc262e was corrupted as per https://github.com/icgc-argo/workflow-roadmap/issues/392#issuecomment-1944992340

Actions:

Analyses to REPUBLISH

Run Analysis Study
wes-e8fdc169af9b4e10a9201706c71a4de2 760de433-b4bd-4e30-8de4-33b4bd1e308d POG-CA
wes-e8fdc169af9b4e10a9201706c71a4de2 f09eca25-143d-4b4f-9eca-25143d3b4f66 POG-CA
wes-e8fdc169af9b4e10a9201706c71a4de2 fe25cad2-5aff-444d-a5ca-d25aff544d69 POG-CA

Analyses to UNPUBLISH

Run Analysis Study
wes-c96e46f774824840ba815ba18e6059c2 c55dc153-488b-4a13-9dc1-53488bca134d MUTO-INTL
wes-c96e46f774824840ba815ba18e6059c2 c6055da5-85dc-49ff-855d-a585dce9ff87 MUTO-INTL
wes-c96e46f774824840ba815ba18e6059c2 70341068-904f-4522-b410-68904f4522a9 MUTO-INTL
wes-c96e46f774824840ba815ba18e6059c2 b9c0e450-842f-4d7b-80e4-50842f5d7b17 MUTO-INTL
wes-c96e46f774824840ba815ba18e6059c2 b77c8991-2f93-40da-bc89-912f93b0daee MUTO-INTL
wes-c96e46f774824840ba815ba18e6059c2 c1379b77-f025-43a6-b79b-77f02523a673 MUTO-INTL
wes-6d357325de134e88b4c963ac986c3a66 34e1f714-e59f-4f65-a1f7-14e59fff657c MUTO-INTL
wes-6d357325de134e88b4c963ac986c3a66 6d58f65c-e03e-4385-98f6-5ce03ee385a0 MUTO-INTL
wes-6d357325de134e88b4c963ac986c3a66 294573ac-3250-4211-8573-ac3250221171 MUTO-INTL
wes-6d357325de134e88b4c963ac986c3a66 679e56ce-7660-451f-9e56-ce7660051fc2 MUTO-INTL
wes-6d357325de134e88b4c963ac986c3a66 c8b20881-3d1f-4b99-b208-813d1f0b9933 MUTO-INTL
wes-6d357325de134e88b4c963ac986c3a66 19e1819f-bba2-4690-a181-9fbba266901e MUTO-INTL
wes-7d164f0f9ff841b2b7dabf7eb868338c a225c265-18f8-422f-a5c2-6518f8d22f79 MUTO-INTL
wes-7d164f0f9ff841b2b7dabf7eb868338c acb15727-ad44-40c3-b157-27ad4490c3a0 MUTO-INTL
wes-7d164f0f9ff841b2b7dabf7eb868338c 8b364fd1-e83d-4c48-b64f-d1e83d7c484b MUTO-INTL
wes-7d164f0f9ff841b2b7dabf7eb868338c 42428d16-8e43-4651-828d-168e43e65196 MUTO-INTL
wes-7d164f0f9ff841b2b7dabf7eb868338c c10d85a6-6e52-44a4-8d85-a66e5234a40c MUTO-INTL
wes-7d164f0f9ff841b2b7dabf7eb868338c 1b0fd7db-e1b9-44a3-8fd7-dbe1b9d4a3e6 MUTO-INTL
wes-92c4cd3cb7c54990b04555bcd5054b25 1a58b446-9f36-4943-98b4-469f368943d7 MUTO-INTL
wes-92c4cd3cb7c54990b04555bcd5054b25 d7346401-bd09-45ba-b464-01bd0925ba76 MUTO-INTL
wes-92c4cd3cb7c54990b04555bcd5054b25 2975248c-f7ba-4a6d-b524-8cf7baea6da9 MUTO-INTL
wes-92c4cd3cb7c54990b04555bcd5054b25 8f08e66a-ec07-488b-88e6-6aec07e88b4a MUTO-INTL
wes-92c4cd3cb7c54990b04555bcd5054b25 2c69ddba-6c3a-4a12-a9dd-ba6c3a0a12e5 MUTO-INTL
wes-92c4cd3cb7c54990b04555bcd5054b25 c3d3bbbc-45fc-409a-93bb-bc45fcb09a44 MUTO-INTL
wes-292cc6cbb7554247aebd4d40e8c149e0 c6b11cd0-6bd4-414d-b11c-d06bd4014d88 MUTO-INTL
wes-292cc6cbb7554247aebd4d40e8c149e0 59ccf72f-0822-4c60-8cf7-2f08221c6086 MUTO-INTL
wes-292cc6cbb7554247aebd4d40e8c149e0 191154a4-ebe3-454b-9154-a4ebe3654b15 MUTO-INTL
wes-292cc6cbb7554247aebd4d40e8c149e0 73551258-25dd-4983-9512-5825dd898307 MUTO-INTL
wes-292cc6cbb7554247aebd4d40e8c149e0 7f4a78f7-a9dc-45c5-8a78-f7a9dc75c502 MUTO-INTL
wes-292cc6cbb7554247aebd4d40e8c149e0 ef04040a-a2e4-4a51-8404-0aa2e49a51c5 MUTO-INTL
wes-6797dff1ac3a4fb081713b5bb6dfb3ed 042a8a95-bc80-41a5-aa8a-95bc80e1a5a8 MUTO-INTL
wes-6797dff1ac3a4fb081713b5bb6dfb3ed d2094dc8-7f2e-4b84-894d-c87f2efb844c MUTO-INTL
wes-6797dff1ac3a4fb081713b5bb6dfb3ed f8db7bb9-4e65-42ab-9b7b-b94e65f2ab1a MUTO-INTL
wes-6797dff1ac3a4fb081713b5bb6dfb3ed 0865972e-e370-47e4-a597-2ee37097e420 MUTO-INTL
wes-6797dff1ac3a4fb081713b5bb6dfb3ed 9b000de6-4931-459d-800d-e64931059d33 MUTO-INTL
wes-6797dff1ac3a4fb081713b5bb6dfb3ed 21d3de4f-0d18-40f1-93de-4f0d1870f160 MUTO-INTL
wes-d73fe47da7444f3997cf770351b1dccf b34169d7-ae59-460e-8169-d7ae59760e32 MUTO-INTL
wes-d73fe47da7444f3997cf770351b1dccf 5eb9029d-c918-46f1-b902-9dc91826f1c8 MUTO-INTL
wes-d73fe47da7444f3997cf770351b1dccf 140aaaff-7d45-4892-8aaa-ff7d45f89283 MUTO-INTL
wes-d73fe47da7444f3997cf770351b1dccf f3e45a22-826e-40f1-a45a-22826ef0f119 MUTO-INTL
wes-d73fe47da7444f3997cf770351b1dccf 38bff8b5-ee74-46f5-bff8-b5ee7466f5a3 MUTO-INTL
wes-d73fe47da7444f3997cf770351b1dccf 97f9af80-701a-4392-b9af-80701a9392cc MUTO-INTL
wes-16d50d9a3d9f49c3adbf5215e99b4666 dada6cf4-ef7d-4f94-9a6c-f4ef7dbf949e MUTO-INTL
wes-16d50d9a3d9f49c3adbf5215e99b4666 b2860ffb-0eef-4f92-860f-fb0eef8f924f MUTO-INTL
wes-16d50d9a3d9f49c3adbf5215e99b4666 80085ecc-53da-4fdb-885e-cc53da4fdb80 MUTO-INTL
wes-16d50d9a3d9f49c3adbf5215e99b4666 abbc7af4-6be1-43b3-bc7a-f46be193b349 MUTO-INTL
wes-16d50d9a3d9f49c3adbf5215e99b4666 b09b7107-ec3d-4c39-9b71-07ec3d5c393d MUTO-INTL
wes-16d50d9a3d9f49c3adbf5215e99b4666 94a18826-3977-46a9-a188-26397786a959 MUTO-INTL
wes-63271bda2dee4194a1e3f5c0a3d3cf37 66cce45e-8a2c-48bb-8ce4-5e8a2cb8bb8b POG-CA
wes-63271bda2dee4194a1e3f5c0a3d3cf37 5aae9b4e-ee04-4f48-ae9b-4eee04cf4808 POG-CA
wes-63271bda2dee4194a1e3f5c0a3d3cf37 1945d58e-be16-4aa4-85d5-8ebe163aa42a POG-CA
wes-f63059947de9492689d582a805797317 29830a95-8882-48e5-830a-95888288e5c5 POG-CA
edsu7 commented 6 months ago

Updated summary report with clinical flags.

Current Summary

  P1000-US POG-CA MUTO-INTL
# Of sample combinations flagged by QC      
no new files 54   310
missing pair   17  
coverage issue   1 60
clinically incomplete   294 59
variants not called 1   1
contamination 6    
# Of sample combinations okay for release 17 358 1128
total 78 670 1558
P1000-US POG-CA MUTO-INTL
New Donors 4 300 913
Previous Donors 13   102
# Donors eligible for release 17 300 1015
# Analyses eligible for release 112 2448 16246
# Files eligible for release 466 8064 42748
  P1000-US POG-CA MUTO-INTL
# of donor by workflow      
DNA Seq 4 300 913
GATK Mutect 4 300 912
Sanger WGS 14 58 911
  P1000-US POG-CA MUTO-INTL
# of files by workflow      
DNA Seq 168 4562 18168
GATK Mutect 32 2400 7296
Sanger WGS 266 1102 17284

DR9_summary_reportV2.xls

lindaxiang commented 6 months ago

@edsu7 For MUTO-INTL, there's one more run which has duplicated runs. It has generated sanger variant calling results in 2021, but got recently re-running the sanger workflows. So the recent new runs' results will need to be suppressed.

studyId donorId sampleId tumourNormalDesignation experimental_strategy input_analysisId workflow_name new_runId new_run_complete_time old_runId old_run_complete_time
MUTO-INTL DO250001 SA610008 Tumour WGS 2c7146e0-ab88-466b-b146-e0ab88066bff sanger-wgs-variant-calling wes-5c898b17153a4fd290ea70ed99c50b94 2024-05-17 22:33 wes-2feb9e4f3e094e9a975bdb275a794b10 2021-07-25 8:05
edsu7 commented 6 months ago
Following have been unpublished run_id analysis_id study_id
wes-5c898b17153a4fd290ea70ed99c50b94 33d69f24-1167-4d7f-969f-2411679d7ffc MUTO-INTL
wes-5c898b17153a4fd290ea70ed99c50b94 31c20875-2c46-440f-8208-752c46c40f50 MUTO-INTL
wes-5c898b17153a4fd290ea70ed99c50b94 3f837349-3eed-486d-8373-493eed286d31 MUTO-INTL
wes-5c898b17153a4fd290ea70ed99c50b94 1540c52f-12a7-433b-80c5-2f12a7233b5c MUTO-INTL
wes-5c898b17153a4fd290ea70ed99c50b94 a0e62970-a340-426f-a629-70a340c26f1d MUTO-INTL
wes-5c898b17153a4fd290ea70ed99c50b94 14213261-4351-46cd-a132-61435106cd68 MUTO-INTL
edsu7 commented 5 months ago

Completed. Closing