Closed javiercnav closed 5 months ago
I haven't seen this error before but I'll take a look at it so I can get it resolved before the next update. Can you upload your GFF file?
In the meantime, can you try running this to see if you get the same error?
source activate VEBA-binning-eukaryotic_env
BAM="T111_all_s.bam T121_all_s.bam T141_all_s.bam T311_all_s.bam T321_all_s.bam T331_all_s.bam T411_all_s.bam T421_all_s.bam T431_all_s.bam"
featureCounts --verbose -G mendoG_euks.fasta -a binning_euks/mendo_euks/tmp/gene_models.eukaryotic.gff -o binning_euks/mendo_euks/intermediate/4__featurecounts/featurecounts.orfs.tsv -F GTF --tmpDir binning_euks/mendo_euks/tmp/featurecounts -T 30 -g gene_id -t CDS -p --countReadPairs ${BAM}
I added the verbose flag so maybe that will indicate which record is throwing the error.
Thanks a lot for the quick response! I run the command you provided and got the following:
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id' An example of attributes included in your GTF annotation is 'ID=k127_9079703_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.446;conf=98.06;score=17.06;cscore=14.96;sscore=2.10;rscore=-1.19;uscore=-0.92;tscore=4.21;contig_id=k127_9079703;gene_id=k127_9079703_1;gene_biotype=protein_coding;;organelle=mitochondrion;'.
The gff file can be downloaded from the following link: https://www.dropbox.com/scl/fi/svhuk6ha1lilbfkl84dyl/gene_models.eukaryotic.gff.txt?rlkey=4cokljnhn01kp69wicmstg4tk&dl=1
I tested a second dataset, also for euks, and got the same error... I am wondering if there is a wrongly set variable in my installation in the environment for euks...
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id'
I used the same dataset and same bam files to bin prokaryotes and viruses and all seems fine there.
Hmm...ok let's try this.
source activate VEBA-binning-eukaryotic_env
CDS_GFF="gene_models.eukaryotic.CDS.gff"
cat binning_euks/mendo_euks/tmp/gene_models.eukaryotic.gff | grep "MetaEuk" | grep "CDS" > ${CDS_GFF}
BAM="T111_all_s.bam T121_all_s.bam T141_all_s.bam T311_all_s.bam T321_all_s.bam T331_all_s.bam T411_all_s.bam T421_all_s.bam T431_all_s.bam"
featureCounts --verbose -G mendoG_euks.fasta -a ${CDS_GFF} -o binning_euks/mendo_euks/intermediate/4__featurecounts/featurecounts.orfs.tsv -F GTF --tmpDir binning_euks/mendo_euks/tmp/featurecounts -T 30 -g gene_id -t CDS -p --countReadPairs ${BAM}
If that doesn't work then I have a couple of options:
Try this command if you do:
CMD="source activate VEBA && veba --module binning-eukaryotic --params=\"-f ${FASTA} -b ${BAM} -n ${ID} -p ${N_JOBS} -m 1500 -a metabat2\""
Here's the featurecounts step:
# Program:featureCounts v2.0.3; Command:"/expanse/projects/jcl110/miniconda3/envs/VEBA-binning-eukaryotic_env/bin/featureCounts" "-G" "veba_output/binning/prokaryotic/S1/output/unbinned.fasta" "-a" "veba_output/binning/eukaryotic/S1/tmp/gene_models.eukaryotic.gff" "-o" "veba_output/binning/eukaryotic/S1/intermediate/4__featurecounts/featurecounts.orfs.tsv" "-F" "GTF" "--tmpDir" "veba_output/binning/eukaryotic/S1/tmp/featurecounts" "-T" "1" "-g" "gene_id" "-t" "CDS" "-p" "--countReadPairs" "veba_output/assembly/S1/output/mapped.sorted.bam"
S1_unbinned_for_eukaryotic.fasta.gz S1_1.fastq.gz S1_2.fastq.gz S1_mapped.sorted.bam
Hello there; The VEBA version is 1.2.0 The code you shared for my dataset did the job. Feature counts were generated after subsetting CDS & Metaeuk. Now, I will try to figure out what happened with the formatting of my gff files. I also ran the binning_eukaryotic using the files you provided, and the run finished without any issues. Thanks a lot!
No problem! Glad we got this resolved and thanks again for bringing this issue to my attention. At least I know that filtering for just the CDS solves the issue. I'll keep my eye out for this for future versions. Will be an easy fix to just grep CDS and MetaEuk into a tmp GFF and use that if necessary.
Describe the bug: ERROR: Subread failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id' An example of attributes included in your GTF annotation is 'ID=k127_9079703_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.446;conf=98.06;score=17.06;cscore=14.96;sscore=2.10;rscore=-1.19;uscore=-0.92;tscore=4.21;contig_id=k127_9079703;gene_id=k127_9079703_1;gene_biotype=protein_coding;
See the file gene_models.eukaryotic.gff
Now see the log file for subread indicating that -g should be 'gene_id'
Versions
Command used to produce error: 4featurecounts cat binning_euks/mendo_euks/intermediate/3busco/filtered/genomes/*.gff > binning_euks/mendo_euks/tmp/gene_models.eukaryotic.gff && mkdir -p binning_euks/mendo_euks/tmp/featurecounts && ( /projects/navarro_lab/envs/VEBA-binning-eukaryotic_env/bin/featureCounts -G mendoG_euks.fasta -a binning_euks/mendo_euks/tmp/gene_models.eukaryotic.gff -o binning_euks/mendo_euks/intermediate/4__featurecounts/featurecounts.orfs.tsv -F GTF --tmpDir binning_euks/mendo_euks/tmp/featurecounts -T 30 -g gene_id -t CDS -p --countReadPairs T111_all_s.bam T121_all_s.bam T141_all_s.bam T311_all_s.bam T321_all_s.bam T331_all_s.bam T411_all_s.bam T421_all_s.bam T431_all_s.bam ) && gzip -f binning_euks/mendo_euks/intermediate/4__featurecounts/featurecounts.orfs.tsv && rm -rf binning_euks/mendo_euks/tmp/gene_models.eukaryotic.gff
Please provide the following files: 4__featurecounts.returncode.txt 4__featurecounts.o.txt 4__featurecounts.e.txt
The input files are too large to upload.
Thanks a lot in advance!