Closed kgaonkar6 closed 4 years ago
Assembly is not used to predict the transcribed strand (given after the slash). The transcribed strand is derived from either the strand of the supporting reads (if a stranded library was used) or from splice patterns of the supporting reads (if a non-stranded library was used).
Assembly is used to determine the transcript sequence around the breakpoints. Basically, Arriba performs a pileup of all the reads supporting a fusion and takes the most frequent base for each position.
Does this answer your question?
Thank you that does help.
So to clarify in the draw_fusions.R the script reads the "after the slash" strand information to gather/re-arrange the gene structure (and overlapping domains) to plot the fusion? That visualization looks amazing so I'm trying to understand it better.
draw_fusions.R
could in theory use the strand information after the slash, but it does not need to, because Arriba arranges gene1 and gene2 to reflect the 5' and 3' end of the fusion. Transcription always starts in gene1 and ends in gene2 (with few exceptions where it's completely unclear). draw_fusions.R
orients the fusion gene such that transcription goes from left to right. It does not need to evaluate the transcribed strand, because this information is implicit in the order of gene1 and gene2.
That visualization looks amazing so I'm trying to understand it better.
Thanks. Feel free to ask if you would like to customize some aspects. What may take an hour for you may be a matter of minutes for me. I can either send you a custom patch or - if the request is likely useful to many users - add a new feature.
Oh great! I'm actually trying to visualize STAR-fusion calls the same way arriba visualizes them to have similar plots from both callers which is why I had specific questions as to how the draw_fusion.R script works. https://gist.github.com/kgaonkar6/d2efc3660d31c20217a9071c6244e70a is what I have now to format the files correctly which seems to work to create the similar plots. But if you can send it as a patch to the draw_fusion.R script itself that would be really appreciated. I guess others might find it useful as well..?
Hello @kgaonkar6
I'm trying to use your script to draw Fusions from StarFusion. It seem to work for some fusions but not all. Did you encounter this kind of behavior ?
Best,
Hi @Ephedria , I was able to plot all the fusions from my StarFusion file. However, I did notice if the gene was not found in the gtf file provided it would print out "Error XYZ exon coordinate not found in gtf" on the pdf page. Also you might want to check the format/colnames of Starfusion, mine was output from STAR-Fusion 1.5.0
Did you get any errors?
Hello Krutika, I didn't get an error like you, mine stopped the process right away :
Warning message: In colnames(fusions)[colnames(fusions) %in% c("X.gene1", "strand1.gene.fusion.", : number of items to replace is not a multiple of replacement length Loading annotation Loading protein domains Drawing fusion #1: CALR:TANC2 Drawing fusion #2: SENP6:MYO6 Drawing fusion #3: IKZF2:ERBB4 Error in if ((codingLength1 == 0 || antisenseTranscription1) && (codingLength2 == : missing value where TRUE/FALSE needed Calls: drawProteinDomains Execution halted
I tried to look at the line that cause the bug, I didn't found anything relevant... My output is from a FusionInspector with the --examine_coding_effect which is the same as StarFusion.
I didn't thought of this possibility since your script worked fine. Could you send me an exemple of your StarFusion output so I may try to understand what cause the bug ?
Best,
Sorry for the late reply, here you go: testStarFusion.txt
Thanks for the link, I will try to found out the problem.
I also apologize for the lack of responses from my side. I have been really busy lately.
In colnames(fusions)[colnames(fusions) %in% c("X.gene1", "strand1.gene.fusion.", :
This is likely the source of the error. I suspect, the column names of your file are not correct. You should check that they are identical to the ones that are generated by Arriba.
I plan on making the draw_fusions.R
script compatible with STAR-Fusion and could send you the script. But I probably will not come around to doing this in the next 1-2 weeks.
Hello Sorry I forgot to answer ... The header doesn't exactly look like Arriba's but it work on some fusions and not on others fusions. I don't find out why ... I will love to see draw_fusions.R working with StarFusion. I hope this is not too much work for you ...
Best,
I already have a prototype of draw_fusions.R that is compatible with STAR-Fusion. I'll push it to GitHub next week probably. Still need to do some testing. Stay tuned.
Ho cool I will be eager to test it !
Hi Krutika,
I just uploaded an enhanced version of draw_fusions.R to the development branch of the repository that should also be compatible with STAR-Fusion output (both with and without FusionInspector extra columns). You can download it from here:
https://raw.githubusercontent.com/suhrig/arriba/develop/draw_fusions.R
Feel free to give it a try if you find the time.
Regards, Sebastian
Thanks for the hard work.
I was testing it with an input of FusionInspector but I didn't seem to work ... I had this error :
Loading annotation Warning message: In parseGtfAttribute("exon_number", exons) : Failed to parse 'exon_number' attribute of 558 GTF record(s). Loading protein domains
Can you send me the command used to create your output of StarFusion ? I will try to stick to it.
Best, Ephe
Warning message: In parseGtfAttribute("exon_number", exons) :
This error is related to the GTF file and has nothing to do with the fusion file. What GTF are you using? Can you send me a link or upload it somewhere?
Moreover, this is only a warning and in view of the fact that only 558 records are affected (of probably tens of thousands), this should not even be a concern. Is a PDF file generated as output?
Hum quite strange, because I try with an ouput from Arriba and it worked well. I didn't change the gtf file. I use the : GRCh38_gencode_v31_CTAT_lib_Oct012019.plug-n-play/ctat_genome_lib_build_dir/ref_annot.gtf
It does create a PDF file, but it's empty ...
The error about parsing the GTF file should appear regardless of whether you use Arriba or STAR-Fusion. Can you confirm? Please send me the full call to draw_fusions.R
for both Arriba and STAR-Fusion.
When I use CTAT annotation, the GTF parsing error is shown with output from both Arriba and STAR-Fusion. So at least the error is definitely not related to the fusions file. And in both cases a non-empty PDF file is produced. I'm suspecting an issue with the content of your STAR-Fusion file that the script cannot handle. Can you share the content? Or at least the header line and one fusion prediction? Thanks.
I did verified, you were right, I do have the same error when trying with an Arriba output...
Here is the output with Arriba output : Loading annotation Warning message: In parseGtfAttribute("exon_number", exons) : Failed to parse 'exon_number' attribute of 558 GTF record(s). Loading protein domains Drawing fusion #1: RNASEH2B-AS1:RNASEH2B Drawing fusion #2: AC004943.2:AC116667.2 ....
And here is the output with a FusionInspector output :
Loading annotation Warning message: In parseGtfAttribute("exon_number", exons) : Failed to parse 'exon_number' attribute of 558 GTF record(s). Loading protein domains /mnt/go-docker/cmd.sh: line 8: 36 Killed /scratch2/tmp/mneou/Scripts/draw_fusions.R --fusions="/scratch2/tmp/mneou/FusionInspector/MAP634_WTS/MAP634_WTS.FusionInspector.fusions.abridged.tsv.coding_effect" --alignments="/scratch2/tmp/mneou/Arriba/MAP634_WTS/STAR/MAP634_WTSAligned.sortedByCoord.out.bam" --output="/scratch2/tmp/mneou/FusionInspector/MAP634_WTS/MAP634_WTS_fusions_CR.pdf" --annotation=/data/annotations/Human/GRCh38/index/star-fusion/v1.8/GRCh38_gencode_v31_CTAT_lib_Oct012019.plug-n-play/ctat_genome_lib_build_dir/ref_annot.gtf --cytobands=/database/cytobands_hg38_GRCh38_2018-02-23.tsv --proteinDomains=/database/protein_domains_hg38_GRCh38_2019-07-05.gff3
What annotation are you using ? I will try to use the same one, if it can prevent this warning ?
Sure here it is : https://github.com/Ephedria/FusionInspector/blob/master/Sample1_FusionInspector.tsv
Thanks again for the help.
Best
/mnt/go-docker/cmd.sh: line 8: 36 Killed
This is the real issue. The script got killed by an external process. The most likely explanation is that it ran out of memory. Typically, the script needs < 2GB. Does the script run inside a job with an artificial memory limit? Another explanation is that your job has a time limit and was killed after reaching that limit. Either way, this is not a bug in the script, but an operating system issue.
What annotation are you using ? I will try to use the same one, if it can prevent this warning ?
I use GENCODE >= v28. The warning is harmless, though. You can continue using your GTF file if you want. All this means is that about 1% of the genes will lack exon labels when they are drawn, i.e., there is no number inside the exon. So it's only cosmetic.
Hoo my bad, you were right I did ran out of memory ... I did again by expending the memory allocatated it works fine.
I will try to test on different samples, but I pretty sure it will turn out well. Thanks again for this tool and the help.
Best,
Hi Krutika,
Arriba version 2.0.0 has just been released. It natively supports creating visualizations of fusion predictions made by STAR-Fusion. So this issue is finally resolved. Let know if you still have questions. I am closing this thread for now.
Regards, Sebastian
Hello,
I am curious about the strand1 and strand2 columns in arriba output. I read in the documentation that the strand before "/" is the strand of the gene from the gtf and the second strand (that is after "/") is from the assembled fusion.
However I'm not sure how arriba assembles the fusion supporting reads to get the strand information from the STAR BAM file. Can you please explain?
Thanks!