hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
187 stars 58 forks source link

PROMISCUOUS_ENHANCER_TARGET requires 5'- donor segment to be within a gene? #353

Closed toddajohnson closed 1 year ago

toddajohnson commented 1 year ago

Hi Peter and Charles,

I just reran my HCC dataset using the new Linx 1.23 version and none of the SVs that are upstream of TERT (that passed GRIPSS filtering and I checked to be present in the tumor sample but not reference in IGV) were included in the Linx fusion output, either with or without labeling as PROMISCUOUS_ENHANCER_TARGET. I blatted the clipped sequence from the GRIDSS assembly bam, and it looks like the donor enhancer elements are all intergenic. I assume that the Linx definition of fusion is limited to fused genes? Is there any way to remove that restriction for this limited case?

Best wishes and Happy New Year!

p-priestley commented 1 year ago

Did you update your known_fusion_data.csv file?

In particular you need the following additional lines:

37 PROMISCUOUS_ENHANCER_TARGET,,C19MC,,,,,,NA,THREE_PRIME_RANGE=-1;19;54133254;54173254 PROMISCUOUS_ENHANCER_TARGET,,TERT,,,,,,NA,THREE_PRIME_RANGE=1;5;1245105;1295105 38 PROMISCUOUS_ENHANCER_TARGET,,C19MC,,,,,,NA,THREE_PRIME_RANGE=-1;chr19;53630000;53670000 PROMISCUOUS_ENHANCER_TARGET,,TERT,,,,,,NA,THREE_PRIME_RANGE=1;chr5;1244990;1294990

toddajohnson commented 1 year ago

Thank you, Peter! I had thought to check the linked resources, but the latest pipeline resources is still 5.31.

toddajohnson commented 1 year ago

I think the TERT coordinates need to be modified. For the PROMISCUOUS_ENHANCER_TARGET to include a breakend upstream of the first TERT codon that is within the distance described in the literature, it should at least include out to chr5:1294990-1314990, if not a little farther. Using: PROMISCUOUS_ENHANCER_TARGET,,TERT,,,,,,NA,THREE_PRIME_RANGE=1;chr5;1294990;1314990 Linx added a line for a TERT fusion to the output for my test example, but not with the ones posted above.

p-priestley commented 1 year ago

There is a balance of course between FP and FN. Most of the TERT rearrangements we have seen which we are confident are pathogenic are actually quite close to the promoter rearrangement. I chose 50k upstream based mainly on this: https://www.nature.com/articles/s41467-019-13885-w. Could you provide a link which suggests that 200k+ is required?

toddajohnson commented 1 year ago

I was looking at that same paper as a reference, and 50k upstream would be reasonable. However, as TERT is on the negative strand of chr5, the GRCh38 coordinates you posted above (THREE_PRIME_RANGE=1;chr5;1244990;1294990) are from ~8.7kb downstream of TERT's stop codon (1,253,728 bp) to the last bp of the 5'-UTR (1,294,990 bp), so they include one bp of upstream regulatory region.

p-priestley commented 1 year ago

I see the problem now - we have extended 50kb in the wrong direction. We will update to the following coordinates

37: THREE_PRIME_RANGE=1;5;1295105;1345105
38: THREE_PRIME_RANGE=1;chr5;1294990;1344990

Thanks for pointing this out