Closed n1zea144 closed 5 months ago
Understand why duplicates got into MAF. Update pipeline to remove duplicates from delivered MAF.
Sophia claims there are 2556 duplicate variants in mutations extended. These 2213 are identified by unix sort/uniq
sort -k 1,1 -k 5,5 -k 6,6 -k 7,7 < ~/tmp/sophia-lung-cohort-maf.txt > ~/tmp/sophia-lung-cohort-maf-sorted.txt uniq -d ~/tmp/sophia-lung-cohort-maf-sorted.txt > ~/tmp/sophia-lung-cohort-maf-duplicates.txt
(sophia-lung-cohort-maf.txt is renamed/copy of MSK_Sophia_LungPts_cBio_mutations_extended - 7-27-23.txt)
sophia-lung-cohort-maf-duplicates.txt
https://github.com/knowledgesystems/cmo-pipelines/pull/1099
Done Condition (What do we need? Why do we need it? Keep this is small as possible!)
Understand why duplicates got into MAF. Update pipeline to remove duplicates from delivered MAF.
Technical Description (How are we going to achieve the above)
Sophia claims there are 2556 duplicate variants in mutations extended. These 2213 are identified by unix sort/uniq
sort -k 1,1 -k 5,5 -k 6,6 -k 7,7 < ~/tmp/sophia-lung-cohort-maf.txt > ~/tmp/sophia-lung-cohort-maf-sorted.txt uniq -d ~/tmp/sophia-lung-cohort-maf-sorted.txt > ~/tmp/sophia-lung-cohort-maf-duplicates.txt
(sophia-lung-cohort-maf.txt is renamed/copy of MSK_Sophia_LungPts_cBio_mutations_extended - 7-27-23.txt)
sophia-lung-cohort-maf-duplicates.txt
Potential Issues
Dependencies
Technical Requirements
Outside People/Teams
Changes