hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
185 stars 57 forks source link

ecDNA links #374

Closed Bo-UT closed 1 year ago

Bo-UT commented 1 year ago

hello,

I have an ecDNA link and want to get the derivative chromosome. I assume the SVs are connected in the order of 1986-1982-1980-1958-1963-1969-1971, but the sequences seem are: chr7 94214606:92164825 - 94142005:92203995 - 94091392:92205026, ... The derivative chromosome looks not correct. I am very confused with lowerBreakendIsStart and upperBreakendIsStart . Could you give a guideline how to read the connection order of these SVs?

Screen Shot 2023-03-10 at 6 51 39 AM Screen Shot 2023-03-10 at 6 55 22 AM

Thanks

Bo

p-priestley commented 1 year ago

I highly recommend that you use the visualisations in LINX to draw this cluster (see https://github.com/hartwigmedical/hmftools/blob/master/linx/README_VIS.md)

Each line in the svLink table represents a link in a chain between 2 breakends. Since each SV has a start and an end breakend, we need to record both which SV is linked in the chain and which end of that SV. The 'IsStart' fields indicate whether it is the start or end of each SV that is joined in the link. So in the 1st line above the end of 1986 is chained to the end of 1982.

The full chain in this example goes like this:

????- 94,214,606

92,164,285-92,203,995 94,142,005-94,091,392 92,205,026-92,536,647 92,609,079-92,949,479 93,035,563-93,098,980 93,496,620-93,496,920 93,312,576- ???? It is an incomplete circle, but LINX has marked as an ecDNA based on the documented logic (https://github.com/hartwigmedical/hmftools/tree/master/linx#special-considerations-for-extrachromosomal-dna-ecdna) and assumes there is a missed variant which may have completed the chain.
Bo-UT commented 1 year ago

Thanks. I ploted this cluster (2605) that has ecDNA. It's really hard to read how the SVs are connected. Is there a bed file output from Linx to show the connections? Thanks

ecDNA circos

p-priestley commented 1 year ago

Thanks for the visualisation. The picture is very complicated obviously, but that is because there are thousands of variants here. In this case, the cluster touches every chromosome in the genome, which I think is unlikely in a single event, so I do think there may be a combination of over clustering and perhaps some FP SV calls here?

The high copy number is (just) visible on chr7. If you wanted to have a closer look, I would suggest looking at the purple somatic copy number to see where the exact bounds of these high copy number regions are. There is no way in LINX currently of only visualising the predicted ecDNA part of a cluster.