icgc-argo / workflow-roadmap

Roadmap and management for genomic data processing
GNU Affero General Public License v3.0
1 stars 0 forks source link

Data release 8.0 - data QC #418

Closed lindaxiang closed 8 months ago

lindaxiang commented 8 months ago

Need to QC the process data from MUTO, POG and P1000

Expected output:

lindaxiang commented 8 months ago

Processed genomic data release criteria:

analysis workflow QC criteria analysisType file access
dna alignment 1. must have mutect2 or sanger called successfully
2. normal coverage > 25X, tumour coverage > 30X (use column normal/tumour_estimated_coverage)
3. donors with multiple tumour/normal pairs must have all samples processed
squencing_alignment, qc_metrics controlled
sanger variant calling 1. exclude ASCAT failed donors
2. exclude purity < 30% donors
3. exclude purity=100% donors
variant_calling, qc_metrics controlled
mutect2 variant calling 1. cross_sample_contamination <4% (use column normal/tumour_mutect2_contamination) variant_calling, qc_metrics controlled
edsu7 commented 8 months ago

Results of QC + Notes per columns updated_DR8.2024-03-04.qc.xlsx

Summary of results of DR8 ready files (excluding indices):

study objects donors analyses
MUTO-INTL 15123 448 4882
P1000-US 2814 67 469
Total 17937 515 5351

Summary of results of DR8 ready files (including indices):

study objects donors analyses
MUTO-INTL 18091 448 4882
P1000-US 3082 67 469
Total 21173 515 5351

List of objectIds,donorIds and analysisIds REVISED_DR8_2024-03-04.qc_files.xls

~[DR8_2024-03-04.qc_files.xlsx] (https://github.com/icgc-argo/workflow-roadmap/files/14487769/DR8_2024-03-04.qc_files.xlsx)~

justincorrigible commented 8 months ago

Closing as completed 👍