nghiavtr / FuSeq

GNU General Public License v3.0
32 stars 12 forks source link

Missing Fusion #3

Closed mflevine closed 5 years ago

mflevine commented 5 years ago

Hi,

I am very impressed with the speed of your tool! I am testing FuSeq to see if it will work for our pipeline. I was disappointed to see that it did not call a highly supported CITED2-MGA fusion which was called by FusionCatcher, SOAPfuse, and STAR-Fusion using the default parameters. I tried looking into the RData files and it looks like there is 46 mapped reads and 169 split read support. Can you help me look into why this was filtered?

nghiavtr commented 5 years ago

Hi mflevine,

Thank you for your interesting in FuSeq. Each fusion detection method has some filters that might sensitive to parameter setting. It might be that the default parameter setting of FuSeq not suitable to your data. If you run FuSeq with keepRData=TRUE, can you send me the output *.RData files of a sample, I will investigate the case of CITED2-MGA fusion in your sample.

Best, Nghia

mflevine commented 5 years ago

I appreciate that! What is the best way to send it to you? The files are very big!

nghiavtr commented 5 years ago

Hi,

I think the files can be uploaded to google drive or box sync, then please send me the link for downloading to my email (TrungNghia.Vu@ki.se). I will take some time to have a look, tks!

Best, Nghia

mflevine commented 5 years ago

I sent the files using ShareBox. Let me know if this does not work for you.

Best, Max

nghiavtr commented 5 years ago

Hi Max,

Thank you for your files, I have downloaded successfully. I will respond to you as soon as possible.

Best, Nghia

nghiavtr commented 5 years ago

Hi Max,

I have taken a look at the fusion gene in your data. As you said, FuSeq discovered the gene fusion in both mapped-read and split-read pipelines, even after some filters. However, that fusion does not satisfy other conditions in post-processing steps.

If you also send me the folder containing the fusion equivalence classes, I would be able to further check. Then I could prepare some scripts helpful for your sample.

My codes to check the information I mentioned are below

Best, Nghia

load("FuSeq_process.RData")
gene1="ENSG00000164442" # CITED2
gene2="ENSG00000174197" # MGA
myfus=paste(gene1,"-",gene2,sep="")

## check split read
# get data after simple filters
myFusion=FuSeq.SR$myFusionFinal
x=myFusion[myFusion$name12%in% myfus, ]
x[1,]
# it is GC-AG condition
x[1,]$GCEnd
x[1,]$AGStart
# but the breaking point is inside an exon
x[1,]$ssExEnd

## check mapped read
# get data after simple filters
myFusion=FuSeq.MR$myFusionFinal
myFusion[myFusion$name21%in% myfus, ]
# get more information
myFusion=FuSeq.MR.postPro$junctBr.refine$myFusionFinal
x=myFusion[myFusion$name21%in% myfus, ]
x
# so the problem is the estimated median length of fusion transcript is too long (1186.5)
x$ftxMedianLen
# fragment length information of RNA-seq sample
FuSeq.MR$fragmentInfo
mflevine commented 5 years ago

Hi Nghai,

Thank you so much for the in-depth look! This appears to be quite an unusual fusion. We are performing ensemble calling, so I think it is better for us to be more sensitive than precise. I will send along the rest of the data. Attached is the figure from the SOAPfuse output.

Best, Max CITED2_chr6_139694229_MGA_chr15_42044005

nghiavtr commented 5 years ago

Hi Max,

Thank you for your data. The figure from SOAPfuse confirms our investigation that the breaking point of in gene MGA is inside an exon which filtered out by FuSeq.

To deal with this issue, I added a parameter "exonBoundary" to control the cases of junction breaks inside exons. This is "exonBoundary=TRUE" by default, but for your case, you should replace by "exonBoundary=FALSE" in the params.txt. This new feature is updated in FuSeq version 1.1.1 (https://github.com/nghiavtr/FuSeq/releases/tag/v1.1.1).

I have tested FuSeq v1.1.1 with the data you sent me, we discovered 31 fusions with "exonBoundary=FALSE" vs 11 fusions with "exonBoundary=TRUE". Fusion gene CITED2-MGA is on the top of the list. I hope the new version of FuSeq with the "exonBoundary" parameter would be helpful to your research project.

Best, Nghia

mflevine commented 5 years ago

Thank you! This should work great with ensemble calling. I will be sure to test out this configuration across my cohort. Again, we are really impressed by the speed of Fuseq, great job.

Best, Max