simon-anders / htseq

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.
https://htseq.readthedocs.io/en/release_0.11.1/
GNU General Public License v3.0
122 stars 77 forks source link

can't figure out why htseq does not run on my gff3 #69

Closed MoutonAlice closed 5 years ago

MoutonAlice commented 5 years ago

Hi, I am sorry to write here but I have been struggling with a file since ten days (on and off). I have done a lot of google search and so one and I still can't figure out a way to move forward. I was asked to do some rna seq mapping for differential expression. I used STAR to map and it worked perfectly but it did not give me counts as it does usually. CHecking in the google group, the developer said it might be due to format of the annotation file and recommended to transform the gff3 to gtf (with gffread). Whatever I am trying it is not working though: If I use the gtf it says that gene_id does not exist but it is present in the gtf:

NW_015503911.1 Gnomon exon 29352 29708 . - . transcript_id "gene0"; gene_id "gene0"; gene_name "LOC107544938"; NW_015503911.1 Gnomon exon 188224 188316 . + . transcript_id "gene1"; gene_id "gene1"; gene_name "LOC107545001";

If I use the gff3, I changed into -idattr=ID or Parent but my error is the following

Error occured when processing GFF file (line 11 of file /bat/GCF_001595765.1_Mnat.v1_genomic.gff): Feature id1 does not contain a 'dattr=ID' attribute Error occured when processing GFF file (line 11 of file /bat/GCF_001595765.1_Mnat.v1_genomic.gff): Feature id1 does not contain a 'dattr=Parent' attribute

here is my gff

NW_015503911.1 RefSeq region 1 32128745 . + . ID=id0;Dbxref=taxon:291302;Name=Unknown;chromosome=Unknown;collection-date=Oct-2012;country=South Africa;dev-stage=adult;gbkey=Src;genome=genomic;isolate=MN2012-01;mol_type=genomic DNA;sex=male;tissue-type=skeletal muscle NW_015503911.1 Gnomon pseudogene 29352 29708 . - . ID=gene0;Dbxref=GeneID:107544938;Name=LOC107544938;gbkey=Gene;gene=LOC107544938;gene_biotype=pseudogene;pseudo=true NW_015503911.1 Gnomon exon 29352 29708 . - . ID=id1;Parent=gene0;Dbxref=GeneID:107544938;gbkey=exon;gene=LOC107544938;model_evidence=Supportin

Would you have suggestions? It is for sure a format problem but can't figure out where or how to resolve it!
Best Alice

iosonofabio commented 5 years ago

Hi Alice,

Please provide the full error log as well as a link to dropbox or gdrive with the GTF, so I can take a look.

Thank you

simon-anders commented 5 years ago

"Feature id1 does not contain a 'dattr=ID' attribute"

This looks like you wrote "-idattr" instead of "--idattr". It's two minus signs!

MoutonAlice commented 5 years ago

Dear Simon and Fabio, Many thanks for your reply. I looked my command hours and always thought I had two minus. I had just one as mentioned in Simon's post. I am deeply sorry for this rookie mistake and waste of your time! Best Alice