Closed jamesdalg closed 2 years ago
Hi James - the segment file show the initial segmentation of the genome (combining GRIDSS, COBALT and AMBER) which is used to do the fitting. After this the data is smoothed significantly to our final output, but we also output this file also so we can occasionally debug parts of the algorithm (eg a poor fit).
I strongly recommend to use the somatic CNV file which is explained fully here: https://github.com/hartwigmedical/hmftools/tree/master/purple#copy-number-file
Is there something missing from that file that you need for the analysis?
What precisely is the exact meaning for the following column names in purple.segment.tsv? Some are obvious like minor/major allele copy number, but some like the ratioSupport column I would like to know more about so that I can use the allele specific copy number in an analysis I'm doing. chromosome start end germlineStatus bafCount observedBAF minorAlleleCopyNumber minorAlleleCopyNumberDeviation observedTumorRatio observedNormalRatio unnormalisedObservedNormalRatio majorAlleleCopyNumber majorAlleleCopyNumberDeviation deviationPenalty tumorCopyNumber fittedTumorCopyNumber fittedBAF refNormalisedCopyNumber ratioSupport support depthWindowCount tumorBAF gcContent eventPenalty minStart maxStart