hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
181 stars 56 forks source link

Just quick question about the purity adjusted value #419

Closed nan5895 closed 1 year ago

nan5895 commented 1 year ago

Hello, thank you for the wonderful tools for cancer genomic analysis !!

I am curious that if the purple failed to select the optimal solution for purity/ploidy, then how do I deal with already purity-adjusted values such as PURPLE_VCN PURPLE_AF PURPLE_CN PURPLE_MACN in TUMOR.purple.somatic.vcf.gz or PURPLE_JCN, PURPLE_CN_CHANGE in TUMOR.purple.sv.vcf.gz.

I believe that sometimes PURPLE could fail to select the optimal solution, and selected solution purity/ploidy may differ with purity/ploidy estimating tools such as TITAN and Sequenza...

In that case, I need to do a manual inspection by using minor/major allele copy number and SNV from purple qc plot result and purity.range.tsv. B-allele frequency plot and read depth ratio/ copy number qc plot from Sequenza.

After manual inspection. if purity and ploidy are selected differently after manual inspection, how do I deal with already purity-adjusted values such as PURPLE_VCN PURPLE_AF PURPLE_CN PURPLE_MACN in TUMOR.purple.somatic.vcf.gz or PURPLE_JCN, PURPLE_CN_CHANGE in TUMOR.purple.sv.vcf.gz. ????

p-priestley commented 1 year ago

Hello,

There are optional parameters to set a min-max for both purity and ploidy. Please see here: https://github.com/hartwigmedical/hmftools/tree/master/purple#optional-fitting-arguments

You can use these to force PURPLE to choose your curated solution.

Please let me know if that helps.

If you see evidence that PURPLE systematically selects poor fits in certain scenarios, then we are happy to take a look at that also. You can also email me directly at p.priestley@hartwigmediicalfoundation.nl

Peter

nan5895 commented 1 year ago

Thank you for your quick response to my question.

So, based on your response, I need to re-run with an additional optional parameter that sets min max of both purity and ploidy.

Purple selects mostly reliable fits, though...

Can you look for this sample? Purple selected ploidy 2.02 and purity 1 fits

WC300-SNUBH-0024-T-01D-WGS-9OI519 copynumber WC300-SNUBH-0024-T-01D-WGS-9OI519 map WC300-SNUBH-0024-T-01D-WGS-9OI519 purity range WC300-SNUBH-0024-T-01D-WGS-9OI519 somatic clonality WC300-SNUBH-0024-T-01D-WGS-9OI519 somatic

in my sequenza result

Screen Shot 2023-06-21 at 3 44 55 PM Screen Shot 2023-06-21 at 3 45 12 PM

Based on the data, a purity of 100% is too large for this sample.

Another case

it usually happens with really low tumor contents samples ( almost no tumor ) any idea about my purity range qc plot?

WC300-SNUBH-0040-T-01D-WGS-9SE461 copynumber WC300-SNUBH-0040-T-01D-WGS-9SE461 map WC300-SNUBH-0040-T-01D-WGS-9SE461 purity range WC300-SNUBH-0040-T-01D-WGS-9SE461 somatic clonality WC300-SNUBH-0040-T-01D-WGS-9SE461 somatic

Once again, Purple is a wonderful tool !! I want to know more about it and interpret data well. Thank you ~!

p-priestley commented 1 year ago

For that first sample, there appears to be no aneuploidy (also in the sequenza plot there is plenty of noise but no clear lines). On the somatic variant pdf, you see that there is a nice clonal peak at copy number 1. This looks compelling to me that the purity must be near 100%, otherwise these variants would have VAFs which are inconsistent with the purity. If you will look in the purity.tsv file, it is likely that PURPLE has used those somatic variants to fit the sample (you will see fit_method = SOMATIC)

For the 2nd plot, is has been fit with purity = 8%. This is the default if the status= NO_TUMOR found. All the purity fits are bad in this case. Note that PURPLE does not try to fit lower than 8% purity. Given the very small number of variants detected, I think the purity is indeed lower than 8%. The Copy Number PDF plot suggests there is a clear copy number feature (at CN=2.7), and the purity may well be ~6%. The output of PURPLE is meant to suggest simply <8%

nan5895 commented 1 year ago

Thank you for your advice !! I do agree with the first part that no aneuploidy and clonal peak at copy number 1 --> compelling near 100% purity. But, I just looked in purity.tsv fit_method = Normal...

purity  normFactor  score   diploidProportion   ploidy  gender  status  polyclonalProportion    minPurity   maxPurity   minPloidy   maxPloidy   minDiploidProportion    maxDiploidProportion    version somaticPenalty  wholeGenomeDuplication  msIndelsPerMb   msStatus    tml tmlStatus   tmbPerMb    tmbStatus   svTumorMutationalBurden runMode targeted
1.0000  1.0137  0.4998  0.8730  2.0200  MALE    NORMAL  0.0343  0.2000  1.0000  1.9400  4.3000  0.0000  0.8730  3.8.4   0.0000  false   0.0105  MSS 9   LOW 0.2473  LOW 71  TUMOR_GERMLINE  false

purple.qc

QCStatus    PASS
Method  NORMAL
CopyNumberSegments  202
UnsupportedCopyNumberSegments   0
Purity  1.0000
AmberGender MALE
CobaltGender    MALE
DeletedGenes    0
Contamination   0.0
GermlineAberrations NONE
AmberMeanDepth  58
p-priestley commented 1 year ago

That means the somatic mode was not triggered. Still I think the fit looks good based on the somatic variant VAFs and the copy number , unless you have evidence otherwise

nan5895 commented 1 year ago

Thank you ~!!!

Once again, thank you, it was really helpful !!

I will close the issue