ncbi / TPMCalculator

TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files
124 stars 33 forks source link

Key ID for gene name was not found on GTF line #78

Closed Liyong-Zhang closed 2 years ago

Liyong-Zhang commented 2 years ago


I am using tpmcalculator (version 0.0.4) in conda env with a .gff3 annotation file. After running couple times with different -k parameters (first time default, second time ID, third time Parent), I got the same error message "Key gene_id/ID/Parent for gene name was not found on GTF line. Error processing GTF line at Chromosome level:"

The parameters are as following:

tpmcalculator \ -g data/annotation/Cs_genes_v2_annot.gff3 \ -d $input1_dir \ -b Aligned.sortedByCoord.out.bam \ -p \ -k "Parent"

The first couple line of my .gff3 annotation file are: Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000;Note=methyl-CPG-binding domain 9

Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000

Chr1 AAFC_NRC mRNA 1 6504 . - . ID=Csa01g001000.1;Name=Csa01g001000.1;Parent=Csa01g001000;Note=methyl-CPG-binding domain 9

Chr1 AAFC_NRC five_prime_UTR 6380 6504 . - . ID=Csa01g001000.1.utr5p1;Parent=Csa01g001000.1

Chr1 AAFC_NRC exon 5865 6504 . - . ID=Csa01g001000.1.exon1;Parent=Csa01g001000.1

Does TPMCalculator not work with .gff3 annotation file? or I got some setting wrong when running the program?

Thank you in advance.

r78v10a07 commented 2 years ago

Hi, TPMCalculator cannot read GFF files. You need to convert the GFF to GTF. Have a look at this threads on how to convert GFF to GTF.

Liyong-Zhang commented 2 years ago


Sounds good. I will convert to GTF file first before running tpmcalculator. Thanks.

sagarutturkar commented 1 year ago

I have similar issue. I downloaded GFF file from here and then converted to GTF format using AGAT tool: --gff C_auris_B11221_features.gff -o C_auris_B11221.gtf

A snapshot of GTF file is attached. I get an error as:

Reading GTF file ...
Key gene for gene name was not found on GTF line.
Error processing GTF line at Chromosome level:

PGLS01000002_C_auris_B11221     CGD     exon    1330    2913    .       +       .       ID "CJI97_001076-T-E1"; Parent "CJI97_001076-T"; gene_id CJI97_001076; transcript_id "CJI97_001076-T"

See attached: test.gtf.txt

Do you have any suggestions to make this work?
