Closed pavlo888 closed 3 years ago
This is an error from featureCounts which seems not able to understand the GFF file you provided. Please check your GFF file and also the parameter "ATTRIBUTE" in config_main.yaml to make sure that they agree with each other.
Hi @zhxiaokang,
I am providing a gff file directly from a Prokka output. Also, the ATTRIBUTE is set as "ID" instead of "gene_id". Nevertheless, I still get an error.
Maybe I should edit the gff file or convert it to some other file type?
Cheers, Pablo
Not sure whether this is the issue, but the error says: no features were loaded in format GTF, but you're actually using a GFF. Maybe try to use a GTF file instead of GFF?
And do you mind sharing the GFF file you're using? At least show some lines that include the "ATTRIBUTE".
But Prokka doesn't provide gtf file. Also, gtf files are the old version of gff (gtf version 3, I think?)
I will share the file and the yaml line
This is how the GFF files looks like (these are screenshots since I cannot upload a gff file here)
And this is the line of the config yaml file
# genome and annotation files
GENOME: data/example/ref/genome/027-annot.fna
ANNOTATION: data/example/ref/annotation/027-annot.gff
ATTRIBUTE: ID # the attribute used in annotation file. It's usually "gene_id", but double check that since it may also be "gene", "ID"...
Hi, I'm sorry for the bad news but it seems that both featureCounts and htseq-count are designed for GTF format: https://help.galaxyproject.org/t/problems-with-attributes-in-featurecounts-gff3-input-instead-of-gtf/3046/2
As I tried out the example data on a GFF3 file with both tools, and they reported similar errors as what you got that they couldn't find the correct attribute. You may try to convert the GFF3 file into GTF file first. Here are some tools for the conversion: https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gff_to_gtf.md
Hope this helps.
Hi @zhxiaokang
I am still having issues even when I have converted the gff file into gtf file using AGAT. My GTF file looks like this
This is the error I obtain: `ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_id' An example of attributes included in your GTF annotation is 'gene_id "nbis-gene-95"; transcript_id "CDACBCJP_00095_gene"; ID "nbis-exon-95"; Parent "CDACBCJP_00095_gene"; inference "ab initio prediction:Prodigal:002006" "similar to AA sequence:ISfinder:ISHaha5"; locus_tag "CDACBCJP_00095"; product "IS110 family transposase ISHaha5"; protein_id "gnl|X|CDACBCJP_00095";' The program has to terminate.
`
`Error in rule featureCount: jobid: 0 output: output-pva/pva027/genome/countFile/20-0357_count.tsv, output-pva/pva027/genome/countFile/20-0357_count.tsv.summary
RuleException: CalledProcessError in line 109 of /home/sam/Downloads/ilse/RASflow/workflow/align_count_genome.rules: Command ' set -euo pipefail; featureCounts -p -T 4 -t exon -g gene_id -a data/example/ref/annotation/027-annot.gtf -o output-pva/pva027/genome/countFile/20-0357_count.tsv data/output/pva027/genome/bamFileSort/20-0357.sort.bam && tail -n +3 output-pva/pva027/genome/countFile/20-0357_count.tsv | cut -f1,7 > temp.20-0357 && mv temp.20-0357 output-pva/pva027/genome/countFile/20-0357_count.tsv ' returned non-zero exit status 255. File "/home/sam/Downloads/ilse/RASflow/workflow/align_count_genome.rules", line 109, in __rule_featureCount File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message `
The GTF file looks good to me, at least the part in the screenshot. Could you share the whole file? I suspect that there are some issues somewhere in the file (not in the screenshot part).
do you have an e-mail for sending the file? I cannot upload here on github.
zhxiaokang@gmail.com
Hi, after testing with your GTF file, I found that featureCounts will throw this error when there are two fields with quotes in the "inference" part. In the ERROR message you posted above, there are "ab initio prediction:Prodigal:002006"
and "similar to AA sequence:ISfinder:ISHaha5"
, and there are more such cases in your GTF file.
I'm trying to post this issue in their Google group but am still waiting to be admitted into the group.
For the time being, you may fix the issue by only cutting the gene_id part from the GTF file with such command:
cut -d';' -f1 027-annot.gtf > 027-annot_only_gene_id.gtf
And use 027-annot_only_gene_id.gtf
instead. I have tested this strategy (only picking the gene_id part) on the example data in RASflow, it produced almost the same counts as using the original GTF file.
hi @zhxiaokang
Thank you for providing the command for fixing the GTF file. It worked perfectly!
However, there seems to be another error yet on the DEA visualization step
`Error in glmFit.default(y = y$counts, design = design, dispersion = dispersion, : dispersion must be numeric Calls: DEA ... glmFit -> glmFit.DGEList -> glmFit -> glmFit.default In addition: Warning message: In estimateDisp.default(y = y$counts, design = design, group = group, : No residual df: setting dispersion to NA Execution halted [Wed Jan 20 09:27:00 2021] Error in rule DEA: jobid: 1 output: output-pva/pva027/genome/dea/countGroup/Untreated_gene_norm.tsv, output-pva/pva027/genome/dea/countGroup/Calcium_gene_norm.tsv, output-pva/pva027/genome/dea/DEA/dea_Untreated_Calcium.tsv, output-pva/pva027/genome/dea/DEA/deg_Untreated_Calcium.tsv
RuleException: CalledProcessError in line 38 of /home/sam/Downloads/ilse/RASflow/workflow/dea_genome.rules: Command ' set -euo pipefail; Rscript scripts/dea_genome.R ' returned non-zero exit status 1. File "/home/sam/Downloads/ilse/RASflow/workflow/dea_genome.rules", line 38, in __rule_DEA File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Removing output files of failed job DEA since they might be corrupted: output-pva/pva027/genome/dea/countGroup/Untreated_gene_norm.tsv, output-pva/pva027/genome/dea/countGroup/Calcium_gene_norm.tsv Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/sam/Downloads/ilse/RASflow/.snakemake/log/2021-01-20T092655.885343.snakemake.log DEA is done! Start visualization of DEA results! Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 end 1 plot 2
[Wed Jan 20 09:27:00 2021] rule plot: input: output-pva/pva027/genome/dea/countGroup, output-pva/pva027/genome/dea/DEA output: output-pva/pva027/genome/dea/visualization/volcano_plot_Untreated_Calcium.pdf, output-pva/pva027/genome/dea/visualization/heatmap_Untreated_Calcium.pdf jobid: 1
Loading required package: plotscale hash-3.0.1 provided by Decision Patterns
Loading required package: GenomicFeatures Loading required package: BiocGenerics Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
colnames, colSums, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
rowSums, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min
Loading required package: S4Vectors Loading required package: stats4
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:hash’:
values, values<-
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges Loading required package: GenomeInfoDb Loading required package: GenomicRanges Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘AnnotationDbi’
The following objects are masked from ‘package:hash’:
keys, keys<-
Loading required package: ggplot2 Loading required package: ggrepel Error in file(file, "rt") : cannot open the connection Calls: plot.volcano.heatmap -> read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file 'output-pva/pva027/genome/dea/DEA/dea_Untreated_Calcium.tsv': No such file or directory Execution halted [Wed Jan 20 09:27:06 2021] Error in rule plot: jobid: 1 output: output-pva/pva027/genome/dea/visualization/volcano_plot_Untreated_Calcium.pdf, output-pva/pva027/genome/dea/visualization/heatmap_Untreated_Calcium.pdf
RuleException: CalledProcessError in line 53 of /home/sam/Downloads/ilse/RASflow/workflow/visualize.rules: Command ' set -euo pipefail; Rscript scripts/visualize.R output-pva/pva027/genome/dea/countGroup output-pva/pva027/genome/dea/DEA output-pva/pva027/genome/dea/visualization ' returned non-zero exit status 1. File "/home/sam/Downloads/ilse/RASflow/workflow/visualize.rules", line 53, in __rule_plot File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/sam/Downloads/ilse/RASflow/.snakemake/log/2021-01-20T092700.946097.snakemake.log Visualization is done! RASflow is done! ` could you direct me on what do to fix it?
Cheers, Pablo
Hi Pablo, glad to hear that it works. Regarding the new issue, it actually happens in DEA, or rather, edgeR, since when I searched the error message, it led me to this post: https://github.com/nanoporetech/pipeline-transcriptome-de/issues/6 As mentioned here, the problem is that there are not enough replicates. So in your case, how many replicates do you have in each group?
Hi @zhxiaokang
There are two conditions and I have just one replicate for each condition, so only two sets of reads. Does this mean I cannot go any further with the visualization?
Cheers, Pablo
With only one replicate, you actually can't do differential expression analysis (DEA) since that's not enough to make statistics sense.
I was fearing that was the case. Thanks a lot for your help!!!
Hi @zhxiaokang
It seems I am having a problem with the BAM step in my workflow.
I get the following output
`[Wed Jan 6 23:51:27 2021] rule featureCount: input: data/output/pva027/genome/bamFileSort/20-0357.sort.bam, data/example/ref/annotation/027-annot.gff output: output-pva/pva027/genome/countFile/20-0357_count.tsv, output-pva/pva027/genome/countFile/20-0357_count.tsv.summary jobid: 2 wildcards: sample=20-0357
Job counts: count jobs 1 featureCount 1
\============================================================================//
//================================= Running ==================================\ || || || Load annotation file 027-annot.gff ... || ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option.. The porgram has to terminate.
[Wed Jan 6 23:51:28 2021] Error in rule featureCount: jobid: 0 output: output-pva/pva027/genome/countFile/20-0357_count.tsv, output-pva/pva027/genome/countFile/20-0357_count.tsv.summary
RuleException: CalledProcessError in line 109 of /home/sam/Downloads/RASflow/workflow/align_count_genome.rules: Command ' set -euo pipefail; featureCounts -p -T 8 -t exon -g ID -a data/example/ref/annotation/027-annot.gff -o output-pva/pva027/genome/countFile/20-0357_count.tsv data/output/pva027/genome/bamFileSort/20-0357.sort.bam && tail -n +3 output-pva/pva027/genome/countFile/20-0357_count.tsv | cut -f1,7 > temp.20-0357 && mv temp.20-0357 output-pva/pva027/genome/countFile/20-0357_count.tsv ' returned non-zero exit status 255. File "/home/sam/Downloads/RASflow/workflow/align_count_genome.rules", line 109, in __rule_featureCount File "/home/sam/anaconda3/envs/rasflow/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message ` Do you have any idea how I could fix it?
Cheers, Pablo