This fixes a major bug in using featureCounts to count exons. GTF files frequently have the same exon listed multiple times in different transcripts, and current behaviour has two problems:
exons are assigned 0 count due to non-assignment of multi-mapping reads (to the same exon in different places)
resulting output has multiple lines for the same exon, breaking downstream iRAP steps
This PR uses -O to allow assignment of reads to multiple exon entries, and then incorporates a sorting and de-duplication step when processing the output such that a single entry per exon is retained.
The side-effect is that all featureCounts outputs are now sorted by feature or meta-feature identifier. I can't see why this would be a problem, but @nunofonseca would know better than me on that.
This fixes a major bug in using featureCounts to count exons. GTF files frequently have the same exon listed multiple times in different transcripts, and current behaviour has two problems:
This PR uses -O to allow assignment of reads to multiple exon entries, and then incorporates a sorting and de-duplication step when processing the output such that a single entry per exon is retained.
The side-effect is that all featureCounts outputs are now sorted by feature or meta-feature identifier. I can't see why this would be a problem, but @nunofonseca would know better than me on that.