zerodel / sailfish-cir

a pipeline for quantification of circular RNA.
14 stars 9 forks source link

Multiple values and length for the same circular RNA #4

Open shreygandhi1990 opened 6 years ago

shreygandhi1990 commented 6 years ago

Hi I am running sailfish-cir using output generated by CIRI2. I noticed that some of the circular RNA transcripts which have been quantified by the tool have multiple values. What does this signify? Is there any meaning to this?

Eg: Name Length EffectiveLength TPM NumReads 10:116517317|116549175 581 394.416 2.27141 18.9804 10:116517317|116549175 185 42.0209 5.61631 5

mirax87 commented 4 years ago

I observed the same problem and it seems to be related to when genes on the opposite strand overlap the gene with the circRNA. In my cases the opposite genes are always fully contained inside an intron of the host gene.

From going through the code, it looks like first the exons of circRNA regions are extracted regardless of strand, and then exon set is reconstructed from the filtered circRNA exons. During the reconstruction the duplication occurs, if a circRNA region contains/overlaps a gene on the opposite strand

Currently, I consider blacklisting those "internal" genes and discard them from the input gtf.