zhxiaokang / RASflow

RNA-Seq analysis workflow
MIT License
105 stars 58 forks source link

Gtf file format error #26

Closed yashsondhi closed 2 years ago

yashsondhi commented 3 years ago

Hi Zhang, I am having issues with the gtf file format, I assume this is something I could fix by changing the index? I have attached the output of the cluster run, but I am not sure where I should edit this parameter?

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id' An example of attributes included in your GTF annotation is 'transcript_id "evm.model.chr1.34";' The program has to terminate.

First few lines of the gtf file

GWHABGR00000001 EVM transcript 15372 30018 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1" GWHABGR00000001 EVM exon 15372 15520 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 16212 16351 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 17501 17758 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 18192 18405 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 18529 18690 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 20641 20838 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 22769 22861 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1"; GWHABGR00000001 EVM exon 23546 23685 . - . transcript_id "evm.model.chr1.1"; gene_id "evm.TU.chr1.1" serial_test_9441360.log ;

zhxiaokang commented 3 years ago

Hi, as the error message explains, featureCounts expects the 9th field to be "gene_id" but in your gtf file it's "transcript_id". Since you're counting genes, so I suggest to simply remove the transcript part. So the following command should do the job:

cut -d" " -f1-8,11-12 old.gtf > new.gtf
yashsondhi commented 3 years ago

Hi, Thanks for the prompt response. I tried to go through feature counters documentation, I wasn't sure sure I should modify the gtf file. Cheers

On Thu, 30 Sep 2021, 08:12 xkzhang, @.***> wrote:

Hi, as the error message explains, featureCounts expects the 9th field to be "gene_id" but in your gtf file it's "transcript_id". Since you're counting genes, so I suggest to simply remove the transcript part. So the following command should do the job:

cut -d" " -f1-8,11-12 old.gtf > new.gtf

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zhxiaokang/RASflow/issues/26#issuecomment-931265169, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAELVOFVUMF6BL6TZ6DN5LUERH4FANCNFSM5FCEDWUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.