ndaniel / fusioncatcher

Finder of Somatic Fusion Genes in RNA-seq data
GNU General Public License v3.0
141 stars 66 forks source link

splitting of single-end reads into paired-end reads: how is it done? #187

Closed emariella closed 3 years ago

emariella commented 3 years ago

Hi Daniel,

I have just started to use FusionCatcher to analyze some single-end RNA-seq data derived from tumor samples and for the moment the results are in line with my expectations. However, although I know that single-end reads that are longer than 130 bp are automatically splitted into paired-end reads (reads are 150 bp long in my case), I cannot find a description of this step neither in the manual nor in the preprint. If you could provide me some details about how the splitting is done, it would be helpful to me for better result interpretation and visualization.

Thanks in advance!!

Elisa

ndaniel commented 3 years ago

Hello Elisa,

basically, one single end reads is split into several/many paired-end reads using the fragment_fastq.py. The longer single-end read is the more paired-ends reads will be generated. This will have the side-effect that a candidate fusion junction might reported as being supported by several paired-ends (which actual come from one single-end read)

Here is a graphical representation:

sedr    ---------------------------------------------------------------------------------------
per1    -------------------------..-------------------------
per2    -------------------------..................-------------------------
per3    -------------------------.....................................-------------------------

where sedr (single-end read) is split into 3 paired-ends reads that are per1, per2, and per3.

nihilee commented 2 years ago

@ndaniel dear developer, I am curious about why break raw reads to fragments to do latter align job, is it helpful in finding fusion? I thought this step may increase a lot working time.