hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

CNV results from purple #557

Closed zhaowsong closed 1 month ago

zhaowsong commented 1 month ago

Hi author,

I obtained some results analyzed using the hmf pipeline from other sources, mainly the SNV/indel results annotated by pave and the CNV results from purple.

Since I only received the .purple.cnv.gene.tsv files, I would like to know if it is possible to estimate TP53 deletion status from the copy number provided by purple. I see that in other places, you mentioned that the definition of homozygous deletion is CN < 0.5, where CN refers to the minCopyNumber, right? For LOH, the definition is minorAlleleCopyNumber < 0.5. Does this definition only consider the minorAlleleCopyNumber?

https://github.com/hartwigmedical/hmftools/issues/88

Here are the TP53 copy number results for each sample that I merged. Can the ones circled in red be considered as LOH?

image

Additionally, I ran my own WGS data using this pipeline, and for the following sample, minCopyNumber =0.0634,the result given by driver.catalog.somatic.tsv is TP53 del.

image
p-priestley commented 1 month ago

For quesiton 1, the answer is yes we always use minor allele copy number < 0.5 as the rule for identifying LOH. The example you have got highlighted looks unsual though as the max copy number is very high which is strange for TP53 and seems to indicate some complex rearrangement. You can visualise the structural variation using LINX.

For the 2nd point, I am not sure what the question is but perhaps you are wondering about why the copy number is not an integer? This is because we don't round any of our calculations in PURPLE. In this case the copy number is very likely = 0 so it is a homozgous deletion.

zhaowsong commented 1 month ago

Thank you for your explanation. These samples are from osteosarcoma, where TP53 often occurs rearrangement. In the first sample, a rearrangement has indeed occurred.

Regarding the definition of LOH, I would like to further inquire.

For the second sample, the TP53 minCopyNumber is 0.0634, and the minorAlleleCopyNumber is 0, indicating a TP53 deletion.

So, does the definition of LOH require simultaneously satisfying the conditions: minCopyNumber > 0.5 and minorAlleleCopyNumber < 0.5?

p-priestley commented 1 month ago

First we check if AlleleCN < 0.5 and then mark that as a homozgous deletion. If not, we check if minorAlleleCN < 0.5 and mark that as LOH. So I would say yes to your question. I hope that is clear

A homozygous deletion arises first from an LOH, and then a second delete event on the other parental chromosome

zhaowsong commented 1 month ago

Thank you very much. I have some other questions regarding the LINX output.

Does the .linx.driver.catalog.tsv file exclude LOH events?

In my own analysis, the final driver.catalog file includes the following event types: MUTATION, DISRUPTION, DEL, HOM_DEL_DISRUPTION, and PARTIAL_AMP.

Additionally, you previously mentioned that the "TUMOR.purple.sv.vcf.gz output already has the full list of SV coordinates. No new SVs are added by LINX."

However, I've noticed that for the same sample, the number of rows in the linx.vis_sv_data.tsv file is significantly fewer than in the TUMOR.purple.sv.vcf.gz file. Did LINX apply additional filtering?

image

Thanks.

p-priestley commented 1 month ago

LOH events are not added to the driver catalog, sorry

The LINX SV list may be smaller as it does not annotate "inferred" breakpoints created by PURPLE. I guess this is likely the main difference in your numbers. There is also some additional filtering: https://github.com/hartwigmedical/hmftools/tree/master/linx#artefact-filtering

zhaowsong commented 1 month ago

Thanks !