suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 49 forks source link

Arriba drawing: Issue with draw_fusions.R ; Calls: drawProteinDomains -> mergeSimilarDomains #132

Closed AmyNjaaye closed 3 years ago

AmyNjaaye commented 3 years ago

Hello, Im Aminata

I'm currently using arriba in order to draw fusions coming form a merge of several outputs (star_fusion, fusioncatcher, and arriba_calling) and validated by fusion_inspector. Lately I have been facing an error at the drawing step. You'll find the command line used and the error obtained. Could you please guide me is order to come after this blocking step ?

draw_fusions.R --fusions=/debian/sample.FusionInspector.fusions.abridged.tsv.coding_effect --alignments=/debian/Aligned.sortedByCoord.out.bam --output=/debian/sample_arriba_fusion.pdf --annotation=/debian/ref_annot.gtf.gz --cytobands=/database/cytobands_hg19_hs37d5_GRCh37_2018-02-23.tsv --proteinDomains=/database/protein_domains_hg19_hs37d5_GRCh37_2019-07-05.gff3 2>&1 | tee -a /debian/samplle_arriba_drawing.log

Drawing fusion #255: PTBP1:TMEM259 Drawing fusion #256: RAB14:EGFL7 Drawing fusion #257: RAB34:PCMT1 Error in if (!any((abs(merged$start - domains[domain, "start"]) + abs(merged$end - : missing value where TRUE/FALSE needed Calls: drawProteinDomains -> mergeSimilarDomains Execution halted

Best Regards ! Version used: v2.1.0 Ref: hg19/GRCh37

suhrig commented 3 years ago

Hi Aminata,

Can you try the current development version of the script, to be found here:

https://raw.githubusercontent.com/suhrig/arriba/develop/draw_fusions.R

I recently fixed a bug on this very line, which may be the same issue you are having.

Regards, Sebastian

AmyNjaaye commented 3 years ago

Thank you Sebastien,

I just launched the analysis using the development version. I'll keep you posted about the results.

Regards, Aminata

AmyNjaaye commented 3 years ago

Hi Sebastien,

This version actually fixed the bug. I'll probably make a patch with the native version and integrate this one. Thank you very much. I will also, surely appreciate a small explanation about the problem, biologically speaking, and the modifications you have made to come after this bug.

Best regards ! Aminata

suhrig commented 3 years ago

Hi Aminata,

I suspect that this little change in the following commit will fix your issue (the patch you need to apply):

https://github.com/suhrig/arriba/commit/a0a2e4114ad207e9d74c5ad36f170d4f6b090906

Regarding an explanation: On rare occasions, it could happen that a divison by zero is caused when drawing the protein domains. Namely, when a protein overlapped an exon by just one base and the fusion junction. In the old script, the protein domain size would be calculated incorrectly (underestimated by 1 base), which does not make much of a difference in most cases, since protein domains are typically hundreds of bases, but it leads to a size of 0, in the case where it's really 1 base, and thus to a division by zero. The fix was to add +1, such that the size is calculated properly. There is no biological relation.

Regards, Sebastian

AmyNjaaye commented 3 years ago

Hello Sebastien,

I was able to integrate the patch and everything is working now. Thank you for the support as well as the explanations.

Best regards, Aminata