share idea of fusion "biomarker_mapping"

1seokyoo commented 3 years ago

Hi sigven,

The "fusion" evidences providing function is not added to the PCGR (v0.9.1) we are using, so we develop and use our own script to provide the evidences.

Could you tell me how you planned the biomarker mapping for fusion in the next release?

And we share the our mapping method, so please advise.

we provide evidence by three evidence levels(base,exon, gene). For each forward and reverse genes, we classify the evidence levels -base: if the breakpoint exactly matches that of the evidence -exon: If the breakpoint is in same exon with that of the evidence -gene: If the breakpoint is in same gene with that of the evidence
set the lower level(base > exon > gene) as the final evidence level and provide fusion evidence with the level. Unlike SNV/INDEL, even if the base level evidence does not match, it provides evidence if it matches the exon or gene level.

So please advise.

Thanks Wonseok.

sigven commented 3 years ago

Hi @1seokyoo,

Thanks a lot for reaching out with this. I was actually thinking of reaching out to the user community on how to implement this, and I am also discussing this with some of my colleagues here in Oslo. Let me get back to you in a few weeks, so we can discuss further how to set the input requirements for RNA fusions and how the mapping to biomarkers can be optimally achieved. I suspect that the resolution (i.e. fusion breakpoints etc) available in CIViC is not always very good, which can make the mapping somewhat imprecise.

Thanks again for your comment, highly appreciated.

kind regards, Sigve

cc @senzhaocode @danielvo

senzhaocode commented 3 years ago

Hi @1seokyoo,

I agree with Sigve @sigven that fusion breakpoint coordinate available in CIVic is not in a good resolution. If you think about to include the fusion breakpoint in "biomarker_mapping" process, the circumstance is much more complicated than your assumptions above. For example in RNA-seq data, breakpoint coordinates of most fusion transcripts in general match to the exon boundary due to splicing mechanism (of course, there are many cases that breakpoints falls in intron region - which suggests new splice sites and probably unknown function for fusion transcript). However, when you look at the nominated fusion transcript by different callers, breakpoint genomic coordinates sometimes show a deviation to exon boundary with 1-5 bp, and users have to some adjustments and align it to exon boundary. The best way for breakpoint coordinate adjustment is to use a gtf or bed file from a standard annotated gene/transcript resource (e.g. ensembl or gencode).

I understand your three evidence level (base, exon, gene) for interpret the outcome of fusion transcript breakpoint. I suggest maybe it will be more sense to use like (at exon boundary, within exon, within intron). The priority order is like at exon boundary > within exon > within intron.

If you would like a simplified version for "biomarker_mapping" process (NOT including fusion breakpoint). The gene feature (e.g. oncogenes or turmo-suppress) of fusion partners are key importance to interpret the outcome. In short, there are two main outcome of fusion transcripts: 1). Activate oncogenes or kinase domain in oncogenes (in particular oncogene is the downstream partner). The fusion transcript could be a potential drug target for treatment. 2). Breakdown of tumor suppress genes. In this case, the fusion transcript is a "biomarker" evidence.

Best

Sen

sigven / pcgr

share idea of fusion "biomarker_mapping" #133