snandiDS / prokseq-v2.0

MIT License
8 stars 7 forks source link

Alignment/count file only contains rRNA and tRNA gene #3

Open 6pwalker opened 3 years ago

6pwalker commented 3 years ago

The countfiles produced only contained a set of 55 genes, all of which seem to be tRNA and rRNA genes. I tried aligning to genome files from GenBank and RefSeq, the results were the same with both alignments.

snandiDS commented 3 years ago

Hi, Thanks for the comment. ProkSeq uses featureCounts for counting the reads, and annotation file does not have any impact with alignment it requires only for read count. If your annotation file only contains tRNA and rRNA only then it will show the reads of those. I suspect there are some formatting issues in your input file. If the issues are not solved you can mail your input file to firoj.mahmud@umu.se, so we can see the reason of the issue. Good luck

6pwalker commented 2 years ago

Hi thank you so much for your reply and I am sorry for my own delay. Yes, you are correct and I did not understand the issue before. I realized that the GTF files (both RefSeq and GeneBank) obtained from NCBI do not contain transcript features that FeatureCount was detecting.

I was able to download the GFF3 file from ensembl (which contained "Exon" features), but when I run the Gff3 to GTF script the "Exon" is no longer present, rather it kept "gene". I tried editing the param file for FeatureCount to look for "gene" feature type, although have since ran into a GeneBodyCoverage issue despite having compatible fasta, bed, and gtf files.

I do appreciate this resource you created and thank you for having such an informative github. Thank you


From: s.nandi @.> Sent: Monday, June 28, 2021 4:57 AM To: snandiDS/prokseq-v2.0 @.> Cc: Walker, Patricia @.>; Author @.> Subject: Re: [snandiDS/prokseq-v2.0] Alignment/count file only contains rRNA and tRNA gene (#3)

Hi, Thanks for the comment. ProkSeq uses featureCounts for counting the reads, and annotation file does not have any impact with alignment it requires only for read count. If your annotation file only contains tRNA and rRNA only then it will show the reads of those. I suspect there are some formatting issues in your input file. If the issues are not solved you can mail your input file to @.**@.>, so we can see the reason of the issue. Good luck

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/snandiDS/prokseq-v2.0/issues/3#issuecomment-869547541, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUKCBUH464ULZFZGHU5V2LLTVBBRVANCNFSM46VQG6LQ.