circRNA_finder output - Githubissues

orzechoj / circRNA_finder

Pipeline for finding circular RNAs from RNA-seq data, based on STAR. Used in (Westholm et al, Cell Reports, 2014).

MIT License

16 stars 9 forks source link

circRNA_finder output #1

Open adomingues opened 9 years ago

adomingues commented 9 years ago

First of all, thank you for making these scripts available. This is not really an issue, is more a request for clarification of the tool's output. I am witting here to that others with the same problem might find find.

I just ran the tool, and I am now going over the output before starting in-depth analysis of the results. I have 3 questions regarding the output:

in the *filteredJunctions.bed files does the 5th column represent the number of reads supporting that circular RNA junction?
also in those files, most (if not all) circular RNAs represented twice, once in the plus and once in the minus strand. Is this by design due to the difficulty of assigning strand information to these?
Going over the source code of postProcessStarAlignment.pl the comments indicate that it should output "indexed bed files with all chimeric reads", however I could not see any indexed .bed file in the results. There are however sorted and indexed bam files, *.Chimeric.out.sorted.bam. Are these the files that actually contain the chimeric reads, and "bed" is just a typo?

I tried to figure out these questions by going over the code, but perl is not my native language :) Cheers.

orzechoj commented 9 years ago

Hi,

Thanks for using these scripts. Hope these answers help:

Yes, the fifth column is the number of reads spanning the circular RNA junction.
These scripts were hard coded to the data I was studying, which is paired end data, where the second read in the pair was on the same strand as the transcript (and the first read in in the pair was on the opposite strand). I have not tested what happens if you have different data, but in the data sets I was looking at I only got reads spanning the circular RNA junction from one strand.
Yes this was a typo, should be bam files. I have updated the comment.

/Jakub

adomingues commented 9 years ago

Hi Jakub,

Thank you for the reply, it was very helpful.

Regarding 2. It is strange because I am also mapping PE data (from Jeck et al 2013) using the script runStar.pl. To give you an idea of the results, this is what I get after running postProcessStarAlignment.pl:

head results/RNAse_R.s_filteredJunctions.bed chr9 113734353 113735838 3s 1400 - chr9 113734353 113735838 4s 1352 + chr11 33307959 33309057 7s 573 + chr11 33307959 33309057 8s 548 - chr16 85667520 85667738 9s 480 + chr16 85667520 85667738 10s 424 - chr20 30954187 30956926 11s 399 + chr20 30954187 30956926 12s 385 - chr9 4286038 4286523 13s 323 - chr9 113734353 113773970 14s 319 - chr9 113734353 113773970 14s 319 -

As you can see several, but not all, circRNAs appear to be present in both strands, albeit with a different number of supporting junctions. Maybe I am reading the results wrongly, or this is simply dataset specific. Anyway, Thanks again for the help.

António

BirongZhang commented 2 years ago

Hi Jakub,

Thanks for developing such a useful tool and sharing it with the public!

I have a small question about the output, the general expression of circular RNA is as follows: *chr : start - end: strand: gene name**

But I noticed that there was no gene name in the output of circRNA_finder, so can I map the circRNA_finder output in circular RNA base (such as circRNA_hg38_database) according to "chr* : start-end:strand", so that I can get the results containing gene name?

Any advice would be highly appreciated! Thanks!

Kind regards, Birong

orzechoj commented 2 years ago

Hi Binang,

Finding overlapping genes is not part of circRNA_finder, but it's not too hard to implement yourself. For the 2014 paper I wrote a custom R script (using the GenomicRanges package) to map the circular RNAs to mRNA transcripts. Basically, I took both the start and the end coordinates, and looked for overlaps with coding regions, UTRs, introns etc.. You can also look for overlaps with circle junctions.

I have been thinking about adding such functionality to circRNA_finder, but right now I don't have time to work on this.

best, Jakub

BirongZhang commented 2 years ago

Hi Jakub,

Thanks for your kind help! I will give it a try.

Regards, Birong

orzechoj commented 2 years ago

I just looked at http://yang-laboratory.com/circpedia/statistics, and at least for this interface it seems as the coordinates are enough, and you don't need the gene names. But I haven't tried this myself.

/Jakub