nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Funannotate predict fails at stage CodingQuarry (only for some genomes) #914

Open yvetboele opened 1 year ago

yvetboele commented 1 year ago

Good morning,

When running funannotate predict (1.8.13) I get an error at the CodingQuarry stage. The exact same commands/pipeline worked for two isolates, and fail for two others, so it is not an installation issue.

Used command: funannotate predict -i /home/yvet/annotations/P93/preprocessing/P93_cleaned_sorted_masked.fasta -o P93_annotation -s "Pseudocercospora eumusae" --isolate P93 --cpus 8 --protein_evidence /home/yvet/annotations/evidence_fastqdump/CIRAD86_proteins/GCF_000340215.1_Mycfi2_protein.fasta $FUNANNOTATE_DB/uniprot_sprot.fasta

The error looks as follows (see full output below): CMD ERROR: CodingQuarry -p 6 -f /home/yvet/annotations/P92/P92_annotation/predict_misc/genome.softmasked.fa -t /home/yvet/annotations/P92/P92_annotation/predict_misc/stringtie.gff3

Running CodingQuarry separately (outside of the funannotate pipeline) gives me an error I also can't really work with: "Segmentation fault (core dumped)"

Any idea what might be the reason? I trained using RNAseq of the species itself and closely related species, and as protein evidence I provide proteins from a closely related species and the general uniprot db.

Thanks for your time! Yvet

FULL OUTPUT: funannotate predict -i /home/yvet/annotations/P93/preprocessing/P93_cleaned_sorted_masked.fasta -o P93_annotation -s "Pseudocercospora eumusae" --isolate P93 --cpus 8 --protein_evidence /home/yvet/annotations/evidence_fastqdump/CIRAD86_proteins/GCF_000340215.1_Mycfi2_protein.fasta $FUNANNOTATE_DB/uniprot_sprot.fasta

[May 09 10:04 AM]: OS: Ubuntu 22.04, 32 cores, ~ 264 GB RAM. Python: 3.8.15 [May 09 10:04 AM]: Running funannotate v1.8.13 [May 09 10:04 AM]: Found training files, will re-use these files: --rna_bam P93_annotation/training/funannotate_train.coordSorted.bam --pasa_gff P93_annotation/training/funannotate_train.pasa.gff3 --stringtie P93_annotation/training/funannotate_train.stringtie.gtf --transcript_alignments P93_annotation/training/funannotate_train.transcripts.gff3 [May 09 10:04 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pasa codingquarry rna-bam genemark selftraining glimmerhmm pasa snap pasa [May 09 10:04 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [May 09 10:05 AM]: Genome loaded: 28 scaffolds; 46,775,304 bp; 4.30% repeats masked [May 09 10:05 AM]: Parsed 879 transcript alignments from: P93_annotation/training/funannotate_train.transcripts.gff3 [May 09 10:05 AM]: Creating transcript EVM alignments and Augustus transcripts hintsfile [May 09 10:05 AM]: Extracting hints from RNA-seq BAM file using bam2hints [May 09 10:05 AM]: Mapping 569,434 proteins to genome using diamond and exonerate [May 09 10:10 AM]: Found 308,673 preliminary alignments with diamond in 0:03:51 --> generated FASTA files for exonerate in 0:01:06 Progress: 26.19% [B^[[B^[[B^[[B Progress: 1.35% ress: 1.33% [May 09 11:32 AM]: Exonerate finished in 1:21:24: found 8,169 alignments [May 09 11:32 AM]: Running GeneMark-ES on assembly [May 09 12:03 PM]: 13,065 predictions from GeneMark [May 09 12:03 PM]: Filtering PASA data for suitable training set [May 09 12:03 PM]: 313 of 325 models pass training parameters [May 09 12:03 PM]: Training Augustus using PASA gene models [May 09 12:03 PM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 81.8% 47.9% exons 23.6% 20.5% genes 1.6% 1.9% [May 09 12:03 PM]: Accuracy seems low, you can try to improve by passing the --optimize_augustus option. [May 09 12:03 PM]: Running Augustus gene prediction using pseudocercospora_eumusae_p93 parameters [May 09 12:15 PM]: 4,841 predictions from Augustus [May 09 12:15 PM]: Pulling out high quality Augustus predictions [May 09 12:15 PM]: Found 1,582 high quality predictions from Augustus (>90% exon evidence) [May 09 12:15 PM]: Running CodingQuarry prediction using stringtie alignments [May 09 12:15 PM]: CMD ERROR: CodingQuarry -p 8 -f /home/yvet/annotations/P93/P93_annotation/predict_misc/genome.softmasked.fa -t /home/yvet/annotations/P93/P93_annotation/predict_misc/stringtie.gff3

nextgenusfs commented 1 year ago

Seems more like a question for CodingQuarry developers @JamesHane

yweii commented 1 year ago

I am also facing the same problem. Can you tell me if the issue has been resolved?

yvetboele commented 1 year ago

No I haven't resolved it

JamesHane commented 1 year ago

Hi, I supervised the project which CodingQuarry was developed in, so I didn't write the code but will try to help. I compiled codingquarry on an up to date system today and also got segmentation faults consistently, then tested a version compiled back in 2016 which is running now without issues. I don't know what the issue is exactly but would guess that compiler/dependency updates may have caused the issue. Strange that it worked for you earlier for some isolates. In the short term if you email me at James.Hane@curtin.edu.au I can send you my CodingQuarry binary and hopefully that will solve your problem.

JamesHane commented 11 months ago

On a related note: If you are testing codingquarry on the same data as funannotate after a failed funannotate run, you should remove the ParameterFiles directory it created during the initial run, or codingquarry may also fail because the directory already exists.