hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
187 stars 58 forks source link

Chromothripsis/ecDNA classification #130

Closed alhafidzhamdan closed 3 years ago

alhafidzhamdan commented 3 years ago

Hi all,

Looking at one cluster plot: image

This I think represents chromothripsis. Do you provide such classification anywhere in LINX output files? The .linx.clusters.tsv file only describes the cluster as "COMPLEX". If you don't, can I ask what is your definition of chromothripsis used in your biorxiv paper?

Thanks

p-priestley commented 3 years ago

Agreed that this is a clear and clear example of chromothripsis. LINX cannot completely chain here due to 2 breakends being missed by GRIDSS and instead inferred by PURPLE (@39.5M and just before @50.6M on chr 19).

We do not (yet) have a prediction of chromothripsis in LINX. This is mainly for 2 reasons:

  1. There is no single accepted definition and we found we could not yet clearly distinguish it from other similar processes such as chromoplexy so it felt that any cutoffs we placed could be arbitary
  2. We often see clusters which look like a mix of chromothripsis followed by repair and amplification processes such as ecDNA and BFB. Classification schemes typically classify as chromothripsis,BFB or ecDNA, but chromothripsis like events can clearly trigger these amplification process.
alhafidzhamdan commented 3 years ago

Thank you @p-priestley Following on point 2, any plans to create a classifier that does this? ie chromothripsis-like event (chromothripsis + repair + amplification) as its own category.

I'd also appreciate your opinion on some other points: 1) Do you think this shows two separate ecDNAs or one ecDNA with multiple oncogenes? E25T cluster33 COMPLEX sv157 289

2) Comparison between Amplicon Architect (AA)-called ecDNA vs LINX-called ecDNA. I wonder if you could explain the different calls (if you are familiar with AA). a) AA (yes) vs LINX (no) E1T_amplicon1 E1T cluster71 COMPLEX sv99 207

b) AA (no) vs LINX (yes): this may represent eccDNA rather than ecDNA (no oncogene, smaller region). BCCA9T cluster487 COMPLEX sv3 003

c) AA (linear amp) vs LINX (yes) DO10900T_amplicon1 DO10900T cluster8 COMPLEX sv47 077

Thanks! A

p-priestley commented 3 years ago

This is topical for us as we are currently tuning the ecDNA logic further in LINX 1.12 as we found some cases where it is sub-optimal.

Regarding your questions

Question 1: Although this is clustered together his looks like 2 separate ecDNA events on chr 4 and chr 12 to me, since the JCN are different and they are only linked by low JCN variants (which caused them to be clustered together). Likely a copy of one ecDNA broke and was inserted into the other at some stage after the ecDNA was formed.

Question 2: A) Based on the picture it is not clear, but I would lean towards ecDNA. The changes we are currently making to LINX may help shed some light on this. It will be difficult to say for sure in cases like this without external validation. B) This is a very short segment flanked by single/inferred breakends on both sides. Since this is a short region my concern would be the reliability of the copy number estimate in between - perhaps this is just an artefact. What is the depthWindowCount in PURPLE output for these regions? C) This looks clearly like ecDNA to me. Is this a GBM tumor? GRIDSS calls single breakends flanking the EGFR amplification on both sides

alhafidzhamdan commented 3 years ago

1)Interesting stuff!

2a) Thanks. 2b) This is the depthWindowCount.

chr10 | 12409001 | 18552000 | 5883
chr10 | 18552001 | 18556589 | 4
chr10 | 18556590 | 18559475 | 2
chr10 | 18559476 | 18571856 | 10
chr10 | 18571857 | 18574000 | 2
chr10 | 18574001 | 19708000 | 1060
chr10 | 19708001 | 19709000 | 1

2c) Yes this is GBM, and the recent pan-cancer paper on ecDNA also called this sample as linear amplification (TCGA-14-0786; supp data at https://www.nature.com/articles/s41588-020-0678-2?proof=t#Sec20).

p-priestley commented 3 years ago

For 2B) I assume that these are the 2 amplified segments?

chr10 | 18556590 | 18559475 | 2 chr10 | 18559476 | 18571856 | 10

If so then that looks like quite decent support for the high copy number region (12 consecutive depth windows) and the amplification is quite high (60 copies) so there is quite likely high amplification here and not just CN noise

For 2C) it looks like in the AA picture that there is a separately amplified section connected (7:57919731-57935910) which is not in the LINX picture. it seems like AA was able to find an INV junction on one side that linked the 2 regions which was missed by GRIDSS? Perhaps you can draw the whole chromosome in LINX? Dealing with this partial information is by far the most difficult part of building an ecDNA classifier I think,

DarioS commented 3 years ago

You keep asking the same questions as I am about to write them! Some more questions about your questions:

2(a) Why is EGFR missing from LINX's plot but included in AmpliconArchitect's? Chromosome 7's gene track is blank. 2(b) and 2(c) Why does the structural variant end in two breakends? In the documentation example, ecDNAs form a closed loop.

image

Mine never look like this but also end at breakends, not forming a closed loop.

You'll be amused to know that I asked about a chromothripsis classifier in Issue 91 three weeks ago, coincidentally.

p-priestley commented 3 years ago

Dario,

2A) EGFR is not sufficiently amplified in this case to be called as a amplification driver. You could force it to be added by setting the -gene EGFR option in the visualiser

2B) & 2C) ideally we can make a closed loop to call ecDNA but this is not always possible due to single breakends / missed SV calls. So if we can find ecDNA candidates that can form a predominantly closed loop without sufficient foldbacks / other breakends to explain the MAX junction copy number then we call as ecDNA. The current logic is explained here:

https://github.com/hartwigmedical/hmftools/blob/master/sv-linx/README.md#special-considerations-for-extrachromosomal-dna-ecdna

alhafidzhamdan commented 3 years ago

Hi @p-priestley, Here's a more complete picture of 2C. There is an adjacent amplified segment. LINX does not close the loop here however.

DO10900T chr7 109

Al

p-priestley commented 3 years ago

Since the other high copy number region is adjacent to the centromere it makes it hard to tell as we lose resolution in the centromeric regions, but it certainly raises the chances of this being linear amplification.

Regarding the single breakends that were called in AA but not LINX, it is possible this is due to tumor contamination of normal which may lead to filtering of the variant as germline in GRIPSS. We noticed this in another sample, and just made a change to our MAX_NORMAL_SUPPORT hard filter in GRIPSS in our upcoming release. We used to hard filter anything with > 3 read support in the normal, but now we will only filter it if the relative support compared to the tumor also exceeds 3%. Not sure if that affects this case.

alhafidzhamdan commented 3 years ago

Comparing ShatterSeek output with LINX, LINX key advantage in visualising interchrosomal interactions makes it easier to validate candidate chromothriptic regions called by ShatterSeek.

I shall await your GRIPSS update. Many thanks! A