Missing genes in allGenes.results.txt

sagarutturkar commented 6 years ago

I followed QoRTs "Example Walkthrough" to run JunctionSeq and got the results. However, I noticed the number of aggregate genes in GFF vs allGenesresults file are different.

GFF file:
grep -c "aggregate_gene" withNovel.forJunctionSeq.gff = 54,720
Results file:
cut -f 2 allGenes.results.txt | sort | uniq | wc -l = 52447

The numbers does not match. Is there any default filtering applied in the writeCompleteResults method? This matters to me because a specific gene of interest which found to be DE (following a custom HtSeq-DESeq2 pipeline) was excluded from the final results derived from JunctionSeq.

hartleys commented 5 years ago

Can you send the exact commands run using JunctionSeq, and the log output it printed to the console?

Did you use the "use.multigene.aggregates" parameter? The default is FALSE.

sagarutturkar commented 5 years ago

Thanks. I was able to get the information for all genes after using the parameter use.multigene.aggregates.

MWSchmid commented 5 years ago

Hi Steve

I recently processed some data with reads of 36 bp size and around 33% of all features were multigene aggregates (RNA-seq atlas for different mouse tissues). The problem is that a gene is classified entirely as multigene aggregate even if only one small exon is shared with another gene. Having also the information of the splice junction, I think that it's quite a loss to remove the entire gene because of a single exon.

I think that it would be better to label only the features that are really affected by multiple genes as multigene aggregates (instead of all features of a gene that overlaps at a small spot with another gene). Or to set a different default for use.multigene.aggregates (or at least make it more clear in the JunctionSeq user guide, at the moment it's just used in one code example but without further explanation).

Nice tools otherwise.

Best regards,

Marc

Besides - with the same annotation but different data (read length of 125 bp and different cell type) around 7.5 % are multigene aggregates. Is this difference due to the read length or due to the different cell/tissue type?

scseekers commented 3 years ago

@MWSchmid

Yes in DEXSeq the exons shared between two genes are removed rather than discarding the entire gene. However, the Junctionseq seems to throw away such genes which are kinda unfair.

hartleys / JunctionSeq

Missing genes in allGenes.results.txt #37