nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

Training Augustus using BUSCO, exon_probs.pbl not found #291

Closed LironShv closed 5 years ago

LironShv commented 5 years ago

Hello

I am running into some issues with running funannotate predict, i am using a conda installation, with manually installed genemark and Augustus (and I am a new to bioinformatics).

funannotate check --show-versions

-------------------------------------------------------
Checking dependencies for funannotate v1.5.3-21ad095
-------------------------------------------------------
You are running Python v 2.7.15. Now checking python packages...
biopython: 1.73
goatools: 0.8.12
matplotlib: 2.2.3
natsort: 6.0.0
numpy: 1.16.3
pandas: 0.24.2
psutil: 5.6.2
requests: 2.21.0
scikit-learn: 0.20.3
scipy: 1.2.1
seaborn: 0.9.0
All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules...
Bio::Perl: 1.007002
Carp: 1.38
Clone: 0.41
DBD::SQLite: 1.62
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.852
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.13
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking external dependencies...
CodingQuarry: 2.0
RepeatMasker: RepeatMasker version development-$Id: RepeatMasker,v 1.332 2017/04/17 19:01:11 rhubley Exp $
RepeatModeler: RepeatModeler 1.0.11
Trinity: Trinity version: v2.1.1
augustus: 3.3.2
bamtools: bamtools 2.4.1
bedtools: bedtools v2.28.0
blat: BLAT v36
diamond: diamond 0.9.24
emapper.py: /home/genomes/software/eggnog-mapper/bin/diamond /home/genomes/software/eggnog-mapper
ete3: 3.1.1
exonerate: exonerate 2.4.0
fasta: no way to determine
gmap: 2017-11-15
gmes_petap.pl: 4.38
hisat2: 2.1.0
hmmscan: HMMER 3.2.1 (June 2018)
hmmsearch: HMMER 3.2.1 (June 2018)
java: 11.0.1
kallisto: 0.45.1
mafft: v7.407 (2018/Jul/23)
makeblastdb: makeblastdb 2.6.0+
minimap2: 2.17-r941
nucmer: 3.1
pslCDnaFilter: no way to determine
rmblastn: rmblastn 2.6.0+
samtools: samtools 1.9
stringtie: 1.3.6
tRNAscan-SE: 2.0.3 (April 2019)
tbl2asn: unknown, likely 25.3
tblastn: tblastn 2.6.0+
trimal: trimAl v1.4.rev15 build[2013-12-17]
All 32 external dependencies are installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/home/genomes/software/funannotate_DB/
$PASAHOME=/home/conda/envs/funannotate/opt/pasa-2.3.3
$TRINITYHOME=/home/conda/envs/funannotate/opt/trinity-2.1.1/
$EVM_HOME=/home/conda/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/team/edouard/install/augustus/config/
$GENEMARK_PATH=/home/genomes/software/genemark_x/
$BAMTOOLS_PATH=/home/conda/bin/bamtools

Firs,t I ran into an issue with my Augustus version, so I installed Augustus manually.

This was my first error output:


[12:33 PM]: OS: linux2, 48 cores, ~ 264 GB RAM. Python: 2.7.15 [12:33 PM]: Running funannotate v1.5.3-21ad095 [12:33 PM]: ERROR: AUGUSTUS (3.3) is not installed properly and this version not work with BUSCO, this is a problem with Augustus compliatation, you may need to compile manually on linux2. [12:33 PM]: AUGUSTUS (3.3) detected, version seems to be compatible with BRAKER and BUSCO [12:33 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [12:34 PM]: Genome loaded: 1,074 scaffolds; 154,561,933 bp; 52.56% repeats masked [12:36 PM]: Aligning transcript evidence to genome with minimap2 [02:12 PM]: Found 51,143,626 alignments, wrote GFF3 and Augustus hints to file [02:13 PM]: Mapping proteins to genome using Diamond blastx/Exonerate [02:13 PM]: Using 547,601 proteins as queries [02:13 PM]: Running Diamond pre-filter search [02:50 PM]: Found 1,305,987 preliminary alignments [02:04 PM]: Exonerate finished: found 4,038 alignments [02:35 PM]: Running GeneMark-ES on assembly [04:26 PM]: Converting GeneMark GTF file to GFF3 [04:26 PM]: Found 36,527 gene models [04:26 PM]: Running BUSCO to find conserved gene models for training Augustus [04:26 PM]: Multi-threading in tblastn v2.6.0 is unstable, running in single threaded mode for BUSCO [04:41 PM]: BUSCO training of Augusus failed, check busco logs, exiting

After installing augustus from source i reran funannotate predict

funannotate predict \
-i $RESOURCES/iso-1.masurca.final.genome.scf.fasta.masked \
--species "Phytophthora" \
--isolate iso-1 \
--transcript_evidence $RNA_PATH/13_AGTCAA_L001_R1_paired.fasta $RNA_PATH/13_AGTCAA_L001_R2_paired.fasta $RNA_PATH/13_AGTCAA_L002_R1_paired.fasta $RNA_PATH/13_AGTCAA_L002_R2_paired.fasta $RNA_PATH/13_AGTCAA_L003_R1_paired.fasta $RNA_PATH/13_AGTCAA_L003_R2_paired.fasta $RNA_PATH/13_AGTCAA_L004_R1_paired.fasta $RNA_PATH/13_AGTCAA_L004_R2_paired.fasta \
--protein_evidence /home/funannotate/conda/try_one/predict/predict_misc/proteins.combined.fa \
--protein_alignments /home/funannotate/conda/try_one/predict/predict_misc/protein_alignments.gff3 \
--transcript_alignments /home/funannotate/conda/try_one/predict/predict_misc/transcript_alignments.gff3 \
-o /home/funannotate/predict

I get the following error output

[05:45 PM]: OS: linux2, 48 cores, ~ 264 GB RAM. Python: 2.7.15 [05:45 PM]: Running funannotate v1.5.3-21ad095 [05:45 PM]: AUGUSTUS (3.3.2) detected, version seems to be compatible with BRAKER and BUSCO [05:45 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [05:46 PM]: Genome loaded: 1,074 scaffolds; 154,561,933 bp; 52.56% repeats masked [05:47 PM]: Existing transcript alignments found: /home/funannotate/predict/predict_misc/transcript_alignments.gff3 [05:47 PM]: Loading protein alignments /home/funannotate/conda/try_one/predict/predict_misc/protein_alignments.gff3 [05:47 PM]: Running GeneMark-ES on assembly [07:39 PM]: Converting GeneMark GTF file to GFF3 [07:39 PM]: Found 36,582 gene models [07:39 PM]: Running BUSCO to find conserved gene models for training Augustus [07:39 PM]: Multi-threading in tblastn v2.6.0 is unstable, running in single threaded mode for BUSCO [08:52 PM]: 257 valid BUSCO predictions found, now formatting for EVM [08:52 PM]: Setting up EVM partitions [08:56 PM]: Generating EVM command list [08:56 PM]: Running EVM commands with 1 CPUs [09:09 PM]: Combining partitioned EVM outputs [09:09 PM]: Converting EVM output to GFF3 [09:11 PM]: Collecting all EVM results [09:11 PM]: 256 total gene models from EVM [09:11 PM]: Checking BUSCO protein models for accuracy [09:12 PM]: 254 gene models validated, using for training Augustus [09:12 PM]: Training Augustus using BUSCO gene models

augustus: ERROR ExonModel: Couldn't open file /home/conda/envs/funannotate/config/species/phytophthora_infestans_iso-1/phytophthora_infestans_iso-1_exon_probs.pbl

Traceback (most recent call last): File "/home/genomes/software/funannotate/bin/funannotate-predict.py", line 1032, in lib.trainAugustus(AUGUSTUS_BASE, aug_species, trainingset, MaskGenome, args.out, args.cpus, numTrainingSet, args.optimize_augustus) File "/home/genomes/software/funannotate/lib/library.py", line 6007, in trainAugustus train_results = getTrainResults(os.path.join(outdir, 'predict_misc', 'augustus.initial.training.txt')) File "/home/genomes/software/funannotate/lib/library.py", line 5842, in getTrainResults return (float(values1[1]), float(values1[2]), float(values2[6]), float(values2[7]), float(values3[6]), float(values3[7])) UnboundLocalError: local variable 'values1' referenced before assignment

The funannotate-predict.log has an error i traced this back to Hash::Merge not being properly installed and changing the path to perl for all .pl files of genemark but the error presist, but still finshes:

[05/17/19 17:45:40]: /home/genomes/software/funannotate/bin/funannotate-predict.py -i /home/funannotate/resources_copies/iso-1.masurca.final.genome.scf.fasta.masked --species Phytophthora infestans --isolate iso-1 --transcript_evidence /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L001_R1_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L001_R2_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L002_R1_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L002_R2_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L003_R1_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L003_R2_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L004_R1_paired.fasta /home/genomes/data/raw_data/rna-seq/iso-1-benti-root/13.myc_zspores/trimmomatic-2/fasta_format/13_AGTCAA_L004_R2_paired.fasta --protein_evidence /home/funannotate/conda/try_one/predict/predict_misc/proteins.combined.fa --protein_alignments /home/funannotate/conda/try_one/predict/predict_misc/protein_alignments.gff3 --transcript_alignments /home/funannotate/conda/try_one/predict/predict_misc/transcript_alignments.gff3 -o /home/funannotate/predict

[05/17/19 17:45:40]: OS: linux2, 48 cores, ~ 264 GB RAM. Python: 2.7.15 [05/17/19 17:45:40]: Running funannotate v1.5.3-21ad095 [05/17/19 17:45:42]: AUGUSTUS (3.3.2) detected, version seems to be compatible with BRAKER and BUSCO [05/17/19 17:45:43]: Loading genome assembly and parsing soft-masked repetitive sequences [05/17/19 17:46:42]: Genome loaded: 1,074 scaffolds; 154,561,933 bp; 52.56% repeats masked [05/17/19 17:47:57]: Existing transcript alignments found: /home/funannotate/predict/predict_misc/transcript_alignments.gff3 [05/17/19 17:47:57]: Loading protein alignments /home/funannotate/conda/try_one/predict/predict_misc/protein_alignments.gff3

[05/17/19 17:47:59]: Running GeneMark-ES on assembly [05/17/19 17:47:59]: /home/genomes/software/genemark_x/gmes_petap.pl --ES --max_intron 3000 --soft_mask 2000 --cores 2 --sequence /home/funannotate/predict/predict_misc/genome.softmasked.fa --fungus

05/17/19 19:39:25: Converting GeneMark GTF file to GFF [05/17/19 19:39:27]: perl /home/conda/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl /home/funannotate/predict/predict_misc/genemark.gff [05/17/19 19:39:34]: Found 36,582 gene models [05/17/19 19:39:34]: Running BUSCO to find conserved gene models for training Augustus [05/17/19 19:39:36]: Multi-threading in tblastn v2.6.0 is unstable, running in single threaded mode for BUSCO [05/17/19 20:52:19]: 257 valid BUSCO predictions found, now formatting for EVM [05/17/19 20:52:34]: /home/genomes/software/funannotate/util/fix_busco_naming.py /home/funannotate/predict/predict_misc/busco_augustus.tmp /home/funannotate/predict/predict_misc/busco/run_phytophthora_infestans_iso-1/full_table_phytophthora_infestans_iso-1.tsv /home/funannotate/predict/predict_misc/busco_augustus.gff3 [05/17/19 20:52:34]: bedtools intersect -a /home/funannotate/predict/predict_misc/genemark.evm.gff3 -b /home/funannotate/predict/predict_misc/buscos.bed [05/17/19 20:52:34]: bedtools intersect -a /home/funannotate/predict/predict_misc/transcript_alignments.gff3 -b /home/funannotate/predict/predict_misc/buscos.bed [05/17/19 20:52:35]: bedtools intersect -a /home/funannotate/predict/predict_misc/protein_alignments.gff3 -b /home/funannotate/predict/predict_misc/buscos.bed [05/17/19 21:11:05]: 256 total gene models from EVM [05/17/19 21:11:05]: Checking BUSCO protein models for accuracy [05/17/19 21:11:05]: /home/conda/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/gff3_file_to_proteins.pl /home/funannotate/predict/predict_misc/busco.evm.gff3 /home/funannotate/predict/predict_misc/genome.softmasked.fa [05/17/19 21:12:23]: 254 gene models validated, using for training Augustus [05/17/19 21:12:23]: Training Augustus using BUSCO gene models [05/17/19 21:12:23]: gff2gbSmallDNA.pl /home/funannotate/predict/predict_misc/busco.final.gff3 /home/funannotate/predict/predict_misc/genome.softmasked.fa 600 /home/funannotate/predict/predict_misc/busco.training.gb

Busco log:

** Start a BUSCO 2.0 analysis, current time: 05/17/2019 19:39:36 ** The lineage dataset is: dikarya_odb9 (eukaryota) Mode is: genome Maximum number of regions limited to: 3 To reproduce this run: python /home/genomes/software/funannotate/util/funannotate-BUSCO2.py -i /home/funannotate/predict/predict_misc/genome.softmasked.fa -o phytophthora_infestans_iso-1 -l /home/genomes/software/funannotate_DB/dikarya/ -m genome -c 2 -sp anidulans Check dependencies... Check input file... Temp directory is ./tmp/

** Phase 1 of 2, initial predictions ** ** Step 1/3, current time: 05/17/2019 19:39:38 ** Create blast database... [makeblastdb] Building a new DB, current time: 05/17/2019 19:39:38 [makeblastdb] New DB name: /home/funannotate/busco/tmp/phytophthora_infestans_iso-1_3395653472 [makeblastdb] New DB title: /home/funannotate/predict/predict_misc/genome.softmasked.fa [makeblastdb] Sequence type: Nucleotide [makeblastdb] Keep MBits: T [makeblastdb] Maximum file size: 1000000000B [makeblastdb] Adding sequences from FASTA; added 1074 sequences in 1.57337 seconds. Running tblastn, writing output to /home/funannotate/busco/run_phytophthora_infestans_iso-1/blast_output/tblastn_phytophthora_infestans_iso-1.tsv... ** Step 2/3, current time: 05/17/2019 19:49:41 ** Getting coordinates for candidate regions... Pre-Augustus scaffold extraction... Running Augustus prediction using anidulans as species: [augustus] Please find all logs related to Augustus here: /home/funannotate/busco/run_phytophthora_infestans_iso-1/augustus_output/augustus.log 05/17/2019 19:49:53 => 0% of predictions performed (2026 to be done) 05/17/2019 19:56:14 => 10% of predictions performed (223/2026 candidate regions) 05/17/2019 20:01:56 => 20% of predictions performed (426/2026 candidate regions) 05/17/2019 20:07:58 => 30% of predictions performed (629/2026 candidate regions) 05/17/2019 20:13:58 => 40% of predictions performed (831/2026 candidate regions) 05/17/2019 20:20:02 => 50% of predictions performed (1034/2026 candidate regions) 05/17/2019 20:25:10 => 60% of predictions performed (1236/2026 candidate regions) 05/17/2019 20:30:17 => 70% of predictions performed (1439/2026 candidate regions) 05/17/2019 20:35:08 => 80% of predictions performed (1642/2026 candidate regions) 05/17/2019 20:40:29 => 90% of predictions performed (1844/2026 candidate regions) 05/17/2019 20:45:45 => 100% of predictions performed Extracting predicted proteins... ** Step 3/3, current time: 05/17/2019 20:48:37 ** Running HMMER to confirm orthology of predicted proteins: 05/17/2019 20:48:37 => 0% of predictions performed (2022 to be done) 05/17/2019 20:48:59 => 10% of predictions performed (224/2022 candidate proteins) 05/17/2019 20:49:09 => 20% of predictions performed (425/2022 candidate proteins) 05/17/2019 20:49:20 => 30% of predictions performed (628/2022 candidate proteins) 05/17/2019 20:49:37 => 40% of predictions performed (831/2022 candidate proteins) 05/17/2019 20:49:46 => 50% of predictions performed (1032/2022 candidate proteins) 05/17/2019 20:49:58 => 60% of predictions performed (1235/2022 candidate proteins) 05/17/2019 20:50:07 => 70% of predictions performed (1437/2022 candidate proteins) 05/17/2019 20:50:24 => 80% of predictions performed (1639/2022 candidate proteins) 05/17/2019 20:50:41 => 90% of predictions performed (1841/2022 candidate proteins) 05/17/2019 20:50:51 => 100% of predictions performed Results: C:51.8%[S:19.6%,D:32.2%],F:11.8%,M:36.4%,n:1312 680 Complete BUSCOs (C) 257 Complete and single-copy BUSCOs (S) 423 Complete and duplicated BUSCOs (D) 155 Fragmented BUSCOs (F) 477 Missing BUSCOs (M) 1312 Total BUSCO groups searched

** Phase 2 of 2, predictions using species specific training ** ** Step 1/3, current time: 05/17/2019 20:50:53 ** Extracting missing and fragmented buscos from the ancestral_variants file... WARNING The busco id(s) ['EOG09264XM2', 'EOG09261JCQ', 'EOG09263A5D', 'EOG09261C0G', 'EOG09260S5R', 'EOG09265SHM', 'EOG09264LH2', 'EOG09262XRU', 'EOG09261OLD', ""' +100 genes EOG0926248P', 'EOG09264XOC', 'EOG09260BL6', 'EOG09260N53'] were not found in the ancestral_variants file Running tblastn, writing output to /home/funannotate/busco/run_phytophthora_infestans_iso-1/blast_output/tblastn_phytophthora_infestans_iso-1_missing_and_frag_rerun.tsv... [tblastn] Warning: [tblastn] Query is Empty! Getting coordinates for candidate regions... ** Step 2/3, current time: 05/17/2019 20:50:54 ** Training Augustus using Single-Copy Complete BUSCOs: 05/17/2019 20:50:54 => Converting predicted genes to short genbank files... 05/17/2019 20:51:55 => All files converted to short genbank files, now running the training scripts... Pre-Augustus scaffold extraction... Re-running Augustus with the new metaparameters, number of target BUSCOs: 632 05/17/2019 20:52:08 => 0% of predictions performed (0 to be done) 05/17/2019 20:52:08 => 100% of predictions performed Extracting predicted proteins... ** Step 3/3, current time: 05/17/2019 20:52:08 ** Running HMMER to confirm orthology of predicted proteins: 05/17/2019 20:52:08 => 0% of predictions performed (0 to be done) 05/17/2019 20:52:08 => 100% of predictions performed Results: C:51.8%[S:19.6%,D:32.2%],F:11.8%,M:36.4%,n:1312 680 Complete BUSCOs (C) 257 Complete and single-copy BUSCOs (S) 423 Complete and duplicated BUSCOs (D) 155 Fragmented BUSCOs (F) 477 Missing BUSCOs (M) 1312 Total BUSCO groups searched

BUSCO analysis done with WARNING(s). Total running time: 4362.72406006 seconds Results written in /home/funannotate/busco/run_phytophthora_infestans_iso-1/

** Start a BUSCO 2.0 analysis, current time: 05/17/2019 21:11:15 ** The lineage dataset is: dikarya_odb9 (eukaryota) Mode is: proteins To reproduce this run: python /home/genomes/software/funannotate/util/funannotate-BUSCO2.py -i /home/funannotate/predict/predict_misc/busco.evm.proteins.fa -o phytophthora_infestans_iso-1 -l /home/genomes/software/funannotate_DB/dikarya/ -m proteins -c 2 -sp anidulans Check dependencies... Check input file... Temp directory is ./tmp/ Running HMMER on the proteins: 05/17/2019 21:11:15 => 0% of predictions performed (1312 to be done) 05/17/2019 21:11:25 => 10% of predictions performed (145/1312 candidate proteins) 05/17/2019 21:11:30 => 20% of predictions performed (276/1312 candidate proteins) 05/17/2019 21:11:36 => 30% of predictions performed (407/1312 candidate proteins) 05/17/2019 21:11:46 => 40% of predictions performed (538/1312 candidate proteins) 05/17/2019 21:11:56 => 50% of predictions performed (670/1312 candidate proteins) 05/17/2019 21:12:00 => 60% of predictions performed (801/1312 candidate proteins) 05/17/2019 21:12:04 => 70% of predictions performed (933/1312 candidate proteins) 05/17/2019 21:12:12 => 80% of predictions performed (1063/1312 candidate proteins) 05/17/2019 21:12:18 => 90% of predictions performed (1195/1312 candidate proteins) 05/17/2019 21:12:22 => 100% of predictions performed Results: C:19.4%[S:19.4%,D:0.0%],F:0.1%,M:80.5%,n:1312 254 Complete BUSCOs (C) 254 Complete and single-copy BUSCOs (S) 0 Complete and duplicated BUSCOs (D) 1 Fragmented BUSCOs (F) 1057 Missing BUSCOs (M) 1312 Total BUSCO groups searched

BUSCO analysis done. Total running time: 68.1360561848 seconds Results written in /home/funannotate/predict/predict_misc/busco_proteins/run_phytophthora_infestans_iso-1/

These are the files i have in predict_misc

augustus.initial.training.txt busco/ busco_augustus.gff3 busco_augustus.tmp busco.evm.gff3 busco.evm.proteins.fa busco.final.gff3 busco_genemark.gff3 busco_predictions.gff3 busco_proteins/ busco_proteins.gff3 buscos.bed busco.training.gb busco.training.gb.test busco.training.gb.train busco_transcripts.gff3 busco_weights.txt EVM_busco/ genemark/ genemark.evm.gff3 genemark.evm.gff3.bak genemark.gff genemark.temp.gff genome.softmasked.fa gmhmm.mod hints.ALL.gff hints.all.sort.tmp hints.all.tmp hints.P.gff protein_alignments.gff3 repeatmasker.bed scaffold.sort.order.txt scaffold.sort.rename.txt transcript_alignments.gff3

Augustus.log

Warning: Block unknown_A is not significant enough, removed from profile. Warning: Block unknown_B is not significant enough, removed from profile. "" Warning: Block unknown_J is not significant enough, removed from profile. Warning: Block unknown_K is not significant enough, removed from profile.

Will create parameters for a EUKARYOTIC species! creating directory /home/conda/envs/funannotate/config/species/BUSCO_phytophthora_infestans_iso-1_3395653472/ ... creating /home/conda/envs/funannotate/config/species/BUSCO_phytophthora_infestans_iso-1_3395653472/BUSCO_phytophthora_infestans_iso-1_3395653472_parameters.cfg ... creating /home/conda/envs/funannotate/config/species/BUSCO_phytophthora_infestans_iso-1_3395653472/BUSCO_phytophthora_infestans_iso-1_3395653472_weightmatrix.txt ... creating /home/conda/envs/funannotate/config/species/BUSCO_phytophthora_infestans_iso-1_3395653472/BUSCO_phytophthora_infestans_iso-1_3395653472_metapars.cfg ... The necessary files for training BUSCO_phytophthora_infestans_iso-1_3395653472 have been created. Now, either run etraining or optimize_parameters.pl with --species=BUSCO_phytophthora_infestans_iso-1_3395653472. etraining quickly estimates the parameters from a file with training genes. optimize_augustus.pl alternates running etraining and augustus to find optimal metaparameters.

Segmentation fault (core dumped)

I have been trying for a while to find out what is going wrong. Do you have any suggestions?

Best wishes,

Liron

nextgenusfs commented 5 years ago

Hi Liron,

So there are quite a few things here, lets try to get them working one-by-one. First is Augustus/BUSCO. Can you confirm that the $AUGUSTUS_CONFIG_PATH variable is set to the properly installed version of Augustus? You want to avoid the system using a failed version of your training parameters, those are stored in $AUGUSTUS_CONFIG_PATH/species/species_name, so to remove a failed set of parameters, remove the proper folder, i.e. you can see what is in there with funannotate species command.

I'm surprised that the Augustus conda version didn't work on your system, on Linux it is usually fine.

The second problem might be tblastn, it looks to have failed during your BUSCO run and hence resulted in only finding a few BUSCO models, realistically you should be finding more like ~1000. Multithreading in tblastn is still broken as far as I know, I've tried to write the code to adapt and default to 1 cpu for this step, but not sure that worked in your case. The easiest solution is to remove the tblastn from the PATH and install an older version, likely all the way back to 2.2.31.

Take advantage of the funannotate test module, which will test your system install. You can run funannotate test -t busco -c 12 to test the busco training method using 12 cpus.

After you get Augustus install fixed and validate that you are passing the tests in funannotate test then look at the docs to see how to use RNA-seq data. You should not be passing RNA-seq reads directly to funannotate predict, instead you need to run these through the funannotate train script to generate a transcriptome assembly, etc. Then using those results you run funannotate predict.

LironShv commented 5 years ago

Hi,

Thanks for your quick reply and your advice, I will have another try. Much appreciate your pipeline and work!

cheers!

LironShv commented 5 years ago

Hi there,

Just letting you know the cause of my error and it might be useful for other people. Although it was quite obvious in the end.

I didn't create symbolic links of the manually installed executables/scrips in augustus/scrips and augustus/auxprogs to conda/env/funannotate/bin/ . When running funannotate it combined parts of the manually installed augustus and parts of my previous conda install.

Best wishes,

L