Closed AnnaSOFI closed 4 years ago
Last time I checked the NCBI GFF3 files were buggy: they always used the transcript IDs as gene IDs, so that different transcripts of the same gene have different gene IDs, and htseq-count will hence think that all reads are overlapping multiple genes and discard them. Can you check in your file whether this issue is still present?
Also, to check the easy stuff first: Have you made sure that the chromosome names match between your SAM file and your GFF file?
No answer in more than a year, closing.
Hi,
I'm currently analyzing RNA-seq data and have been trying to use HTSeq-count for read count. My problem is that feature counts are always 0, even though the program runs through without error. The only thing that is working is a sam file (with human ENSEMBL alignments) with ENSEMBL human GTF file. If I try to run sam file that was done by aligning to NCBI human genome with NCBI GFF3 file I get nothing. I adjusted GTF file default settings to support the use of NCBI GFF3 file (changing --idattr and --type accordingly) instead of default ENSEMBL GTF file. In addition, feature counts are again 0 when using ENSEMBL GTF file and sam file where the reads have been aligned to FASTA from other sources (for example RFAM database or collection of bacterial genomes).
I would need to count my reads from alignment to RFAM database and a collection of bacterial genomes, but I'm a bit lost how to do that or how to modify the settings since the "easy ones" aka NCBI human alignment sam file + NCBI GFF3 file combination is not yielding result.
How could I fix this? Isn't the HTSeq count supposed to work with any GTG file and sam files aligned to any FASTA if the settings have been adjusted? I'm a beginner in python and RNA-seq so writing my own script for this is a bit beyond my reach and I really think that there is something that I have completely missed in just the basic usage.
BRs,
Anna