nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

funannotate predict fails with test and with real data: "Not enough gene models ... to train Augustus" #742

Open IanDMedeiros opened 2 years ago

IanDMedeiros commented 2 years ago

Are you using the latest release? funannotate v1.8.11.

Describe the bug funannotate test is failing at the predict step with error "Not enough gene models 175 to train Augustus (200 required), exiting". Appears to be identical error to #552. I am also receiving similar errors with real data. End of #552 discussion suggested that the error might be related to GeneMark, but I am having troubled setting up GeneMark-ES so wouldn't the program just run without it?

What command did you issue? funannotate test -t all --cpus 10

What probably isn't the problem, based on what I have tried so far Bad Augustus installation. I was getting an Augustus error even earlier in funannotate test, so I replaced the Augustus that was installed by mamba with one (v. 3.3.3) already available on our system. AUGUSTUS_CONFIG_PATH permissions. Ran chmod 777 $AUGUSTUS_CONFIG_PATH/species and error did not go away. Multithreading. Tried with --cpus 1 and 10 ... same error.

Logfiles funannotate-predict.log `[06/29/22 21:19:42]: /hpc/home/idm7/miniconda3/envs/annotate/bin/funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 10 --species Awesome busco

[06/29/22 21:19:42]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12 [06/29/22 21:19:42]: Running funannotate v1.8.11 [06/29/22 21:19:42]: GeneMark path: /hpc/group/bio1/ian/envs/funannotate/gmes_petap [06/29/22 21:19:42]: Full path to gmes_petap.pl: /hpc/group/bio1/ian/envs/funannotate/gmes_petap/gmes_petap.pl [06/29/22 21:19:42]: GeneMark appears to be functional? False [06/29/22 21:19:43]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1} [06/29/22 21:19:43]: Skipping CodingQuarry as no --rna_bam passed [06/29/22 21:19:43]: {'augustus': 'busco', 'snap': 'busco', 'glimmerhmm': 'busco'} [06/29/22 21:19:43]: Parsed training data, run ab-initio gene predictors as follows: [06/29/22 21:19:44]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1} [06/29/22 21:19:45]: Loading genome assembly and parsing soft-masked repetitive sequences [06/29/22 21:19:45]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked

06/29/22 21:20:12: Running BUSCO to find conserved gene models for training ab-initio predictors 06/29/22 21:20:12: /hpc/home/idm7/miniconda3/envs/annotate/bin/python /hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/genome.softmasked.fa -m genome --lineage /hpc/group/bio1/ian/envs/funannotate_db/dikarya -o awesome_busco -c 10 --species anidulans -f --local_augustus /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/ab_initio_parameters/augustus [06/29/22 21:25:12]: 175 valid BUSCO predictions found, validating protein sequences [06/29/22 21:26:04]: 175 BUSCO predictions validated [06/29/22 21:26:04]: Not enough gene models 175 to train Augustus (200 required), exiting busco.log INFO ** Start a BUSCO 2.0 analysis, current time: 06/29/2022 21:20:12 ** INFO The lineage dataset is: dikarya_odb9 (eukaryota) INFO Mode is: genome INFO Maximum number of regions limited to: 3 INFO To reproduce this run: python /hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/genome.softmasked.fa -o awesome_busco -l /hpc/group/bio1/ian/envs/funannotate_db/dikarya/ -m genome -c 10 -sp anidulans INFO Check dependencies... INFO Check input file... INFO Temp directory is ./tmp/

INFO ** Phase 1 of 2, initial predictions ** INFO ** Step 1/3, current time: 06/29/2022 21:20:12 ** INFO Create blast database... INFO [makeblastdb] Building a new DB, current time: 06/29/2022 21:20:12 INFO [makeblastdb] New DB name: /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/tmp/awesome_busco_4188679581 INFO [makeblastdb] New DB title: /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/genome.softmasked.fa INFO [makeblastdb] Sequence type: Nucleotide INFO [makeblastdb] Keep Linkouts: T INFO [makeblastdb] Keep MBits: T INFO [makeblastdb] Maximum file size: 1000000000B INFO [makeblastdb] Adding sequences from FASTA; added 6 sequences in 0.0434968 seconds. INFO Running tblastn, writing output to /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/blast_output/tblastn_awesome_busco.tsv... INFO ** Step 2/3, current time: 06/29/2022 21:20:21 ** INFO Getting coordinates for candidate regions... INFO Pre-Augustus scaffold extraction... INFO Running Augustus prediction using anidulans as species: INFO [augustus] Please find all logs related to Augustus here: /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/augustus.log INFO 06/29/2022 21:20:21 => 0% of predictions performed (743 to be done) INFO 06/29/2022 21:20:56 => 10% of predictions performed (75/743 candidate regions) INFO 06/29/2022 21:21:24 => 20% of predictions performed (149/743 candidate regions) INFO 06/29/2022 21:22:04 => 30% of predictions performed (223/743 candidate regions) INFO 06/29/2022 21:22:39 => 40% of predictions performed (298/743 candidate regions) INFO 06/29/2022 21:23:01 => 50% of predictions performed (372/743 candidate regions) INFO 06/29/2022 21:23:21 => 60% of predictions performed (446/743 candidate regions) INFO 06/29/2022 21:23:38 => 70% of predictions performed (521/743 candidate regions) INFO 06/29/2022 21:23:55 => 80% of predictions performed (596/743 candidate regions) INFO 06/29/2022 21:24:08 => 90% of predictions performed (669/743 candidate regions) INFO 06/29/2022 21:24:20 => 100% of predictions performed INFO Extracting predicted proteins... INFO ** Step 3/3, current time: 06/29/2022 21:24:49 ** INFO Running HMMER to confirm orthology of predicted proteins: INFO 06/29/2022 21:24:49 => 0% of predictions performed (602 to be done) INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092600SD.faa.1): INFO [hmmersearch] Line 2: illegal character % INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092603EH.faa.1): INFO [hmmersearch] Line 2: illegal character % INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092600T4.faa.1): INFO [hmmersearch] Line 2: illegal character % INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092600X0.faa.1): INFO [hmmersearch] Line 2: illegal character % INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG0926009O.faa.1): INFO [hmmersearch] Line 2: illegal character % INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092602I6.faa.3): INFO [hmmersearch] Line 2: illegal character %

<This goes on for many lines, apparently through all the BUSCO loci. Omitting here for space.>

INFO 06/29/2022 21:24:58 => 100% of predictions performed INFO Results: INFO C:13.5%[S:13.3%,D:0.2%],F:0.1%,M:86.4%,n:1312 INFO 177 Complete BUSCOs (C) INFO 175 Complete and single-copy BUSCOs (S) INFO 2 Complete and duplicated BUSCOs (D) INFO 1 Fragmented BUSCOs (F) INFO 1134 Missing BUSCOs (M) INFO 1312 Total BUSCO groups searched

INFO ** Phase 2 of 2, predictions using species specific training ** INFO ** Step 1/3, current time: 06/29/2022 21:25:00 ** INFO Extracting missing and fragmented buscos from the ancestral_variants file... WARNING The busco id(s) ['EOG0926457R', 'EOG09264XJC', 'EOG0926129I', 'EOG09265F2Y', 'EOG09262N3C', 'EOG092602UY', 'EOG09264R4M', 'EOG09264P3R', 'EOG09264HX6', 'EOG09260OCI', 'EOG09261J0P', 'EOG092610TN', 'EOG09264BOA', 'EOG09261DRB', 'EOG092608L0', 'EOG09261FAX', 'EOG09264X8J', 'EOG09262R8O', 'EOG09262K67', 'EOG09260K5F', 'EOG09263LP3', 'EOG09264S3E', 'EOG09262FJB', 'EOG09260VEY', 'EOG09260TLW', 'EOG092608WI', 'EOG09261OXV', 'EOG09264PE5', 'EOG09261JWS', 'EOG09260NNR', 'EOG09264RJL', 'EOG09262KJA', 'EOG09260VYK', 'EOG092641UM', 'EOG092644N1', 'EOG09262V3O', 'EOG09260XMI', 'EOG09263BDA', 'EOG09264I6B', 'EOG092635ST', 'EOG0926071Q', 'EOG09264PK5', 'EOG09263D2P', 'EOG09260VHV', 'EOG09265RGS', 'EOG092603YJ', 'EOG092621ZV', 'EOG09261801', 'EOG09260ZWU', 'EOG09260PI1', 'EOG092607QZ', 'EOG09262W7C', 'EOG09264V3H', 'EOG09261UWT', 'EOG09264881', 'EOG09263E49', 'EOG09265KQ4', 'EOG09260AZM', 'EOG09264T3S', 'EOG09261TEQ', 'EOG09265BJ3', 'EOG0926522L', 'EOG09262CDO', 'EOG09262H34', 'EOG09264J8E', 'EOG09265FL1', 'EOG0926431P', 'EOG09263M8W', 'EOG09265FTY', 'EOG09262Z2S', 'EOG09264719', 'EOG092625AX', 'EOG09265HEP', 'EOG092618J9', 'EOG09260RVQ', 'EOG09263K45', 'EOG09264R1U', 'EOG09261ICI', 'EOG09263RVR', 'EOG09260WG2', 'EOG09263QUM', 'EOG09264ZWF', 'EOG092646WF', 'EOG09261OLD', 'EOG09263W48', 'EOG092632TF', 'EOG09265552', 'EOG09261D4D', 'EOG09264SET', 'EOG092627XA', 'EOG09262JRP', 'EOG09261P7G', 'EOG09262GNE', 'EOG092636T6', 'EOG092625P1', 'EOG092641M3', 'EOG09262POL', 'EOG09264Z3D', 'EOG09260K29', 'EOG092659DX', 'EOG09264G1I', 'EOG09260289', 'EOG09264C3N', 'EOG09262387', 'EOG09264HU0', 'EOG09264W7W', 'EOG09263WM5', 'EOG092629FB', 'EOG09260KM4', 'EOG092604A0', 'EOG09260FZZ', 'EOG09260GKG', 'EOG09262MJW', 'EOG09260XSR', 'EOG092621S9', 'EOG09261IEV', 'EOG09262TEV', 'EOG092641A6', 'EOG09263DQH', 'EOG09263YBT', 'EOG09263KVG', 'EOG092650VI', 'EOG092653O3', 'EOG09264441', 'EOG0926369X', 'EOG092643IE', 'EOG09261XZ6', 'EOG09264XUV', 'EOG092645OU', 'EOG09261I8J', 'EOG09263WWI', 'EOG09260NAN', 'EOG09260S2Z', 'EOG09264XYD', 'EOG0926484N', 'EOG09263FGN', 'EOG09260ETR', 'EOG0926506U', 'EOG09262KVB', 'EOG092605ZA', 'EOG0926248P', 'EOG092635DF', 'EOG092641K1', 'EOG0926315C', 'EOG092658QH', 'EOG09261JVS', 'EOG0926307V', 'EOG0926587S', 'EOG092604KQ', 'EOG09260J97', 'EOG09262HP3', 'EOG09264OQ8', 'EOG09263L7Y', 'EOG09261I0F', 'EOG09264ZDZ', 'EOG09262CXO', 'EOG09261I1I', 'EOG09261727', 'EOG09262BVA', 'EOG09265QTV', 'EOG092605VL', 'EOG09260KDB', 'EOG092617S2', 'EOG09262YP5', 'EOG0926407T', 'EOG092629RT', 'EOG092605OK', 'EOG09260EPS', 'EOG09265JNA', 'EOG09260DBG', 'EOG09260NZ8', 'EOG092621F2', 'EOG09261IOS', 'EOG0926539T', 'EOG09264W1U', 'EOG09260KNR', 'EOG09263PWF', 'EOG092610VI', 'EOG09264KDO', 'EOG09261G1Y', 'EOG09262IY3', 'EOG09261VD2', 'EOG09263KDI', 'EOG092658SK', 'EOG09265A08', 'EOG09263K05', 'EOG09263QPR', 'EOG092644WX', 'EOG092631ML', 'EOG09260KUC', 'EOG09262M0W', 'EOG092658NW', 'EOG09263XN3', 'EOG0926506Z', 'EOG09263U71', 'EOG09262TUR', 'EOG09265040', 'EOG092655IF', 'EOG09262E7I', 'EOG092641G3', 'EOG09261XNU', 'EOG09260EE7', 'EOG092645QN', 'EOG0926092K', 'EOG09263MR4', 'EOG09264XVU', 'EOG092610KH', 'EOG09261WJ8', 'EOG09261HZD', 'EOG09261SS1', 'EOG09261CQG', 'EOG0926273Q', 'EOG092619L1', 'EOG09265CCT', 'EOG09260KIY', 'EOG09262N5O', 'EOG092604ZZ', 'EOG09260R9L', 'EOG092654KW', 'EOG092615Y4', 'EOG09261CWO', 'EOG09260NXC', 'EOG09265G5K', 'EOG092612XD', 'EOG092605T6', 'EOG09261ZFN', 'EOG092620FM', 'EOG092646C6', 'EOG09264VC6', 'EOG092649VG', 'EOG09260LVD', 'EOG09265PWR', 'EOG09262PPU', 'EOG09262F22', 'EOG092615CC', 'EOG092616YZ', 'EOG09264RQY', 'EOG092616QN', 'EOG0926400M', 'EOG092648O6', 'EOG09264KO7', 'EOG09264NDD', 'EOG09262GWQ', 'EOG0926458I', 'EOG0926115V', 'EOG09265M98', 'EOG09260TVA', 'EOG09261RWJ', 'EOG09264A2D', 'EOG09260UA2', 'EOG092634MM', 'EOG09265IT6', 'EOG09263760', 'EOG092642UD', 'EOG092609O9', 'EOG09265FTN', 'EOG09265EKJ', 'EOG0926534P', 'EOG09263KZJ', 'EOG09261DG0', 'EOG09260NHN', 'EOG09262OX9', 'EOG09261T98', 'EOG09260WCZ', 'EOG09262HKC', 'EOG09263F11', 'EOG09261G92', 'EOG09262U7S', 'EOG09264VZ7', 'EOG092602I6', 'EOG09262E4T', 'EOG09262WQX', 'EOG09265HP0', 'EOG09264SSI', 'EOG09260FMW', 'EOG092612AK', 'EOG092600SD', 'EOG09261ACJ', 'EOG09260ZG2', 'EOG09263Y3L', 'EOG09261NLY', 'EOG092655SO', 'EOG092609RF', 'EOG09263CAC', 'EOG09261ABB', 'EOG09264272', 'EOG092651BA', 'EOG09265L8N', 'EOG09261OSU', 'EOG09262MEK', 'EOG09263UN3', 'EOG09260DP1', 'EOG09261AMX', 'EOG09262UAS', 'EOG09262SI7', 'EOG09263KRO', 'EOG09261TPN', 'EOG09260T4S', 'EOG092610QT', 'EOG09262X7T', 'EOG092629ZN', 'EOG092634B1', 'EOG092620EL', 'EOG0926009O', 'EOG09264G0H', 'EOG09262528', 'EOG09260QNB', 'EOG09261EM7', 'EOG092617RY', 'EOG092646CB', 'EOG09261O4Y', 'EOG09263G4R', 'EOG0926248W', 'EOG09260T28', 'EOG092624KK', 'EOG09263OD3', 'EOG09261Q18', 'EOG092658WY', 'EOG09265GSM', 'EOG09265B95', 'EOG092604I8', 'EOG09264FXB', 'EOG09264ZQC', 'EOG09264PI4', 'EOG09262VPD', 'EOG09262QS5', 'EOG09261ZPW', 'EOG09263ZBF', 'EOG09262YAU', 'EOG09262SMG', 'EOG092608ZS', 'EOG0926229Z', 'EOG09261YRA', 'EOG09263EQZ', 'EOG09260TWS', 'EOG09265OQH', 'EOG09263720', 'EOG092653NM', 'EOG09260AZK', 'EOG09261AH9', 'EOG09265B1X', 'EOG09263817', 'EOG0926112A', 'EOG092601KZ', 'EOG09264X31', 'EOG09264398', 'EOG09261N2L', 'EOG09262LI4', 'EOG0926074Y', 'EOG09260FPA', 'EOG09264MGU', 'EOG092626EQ', 'EOG09264U81', 'EOG09265FCK', 'EOG09260BFE', 'EOG09264CA0', 'EOG092603EH', 'EOG092653VU', 'EOG09262NB1', 'EOG092619MJ', 'EOG09260CKC', 'EOG09261DHR', 'EOG09262TO9', 'EOG092625U6', 'EOG09263MGE', 'EOG09264PDD', 'EOG09263IMF', 'EOG092648K5', 'EOG092602MO', 'EOG09263C55', 'EOG09260EZT', 'EOG09264NC7', 'EOG09262JAT', 'EOG09260E8K', 'EOG0926133I', 'EOG092612CC', 'EOG092600SK', 'EOG092648LP', 'EOG09260VTN', 'EOG092648VW', 'EOG09264O6J', 'EOG0926514P', 'EOG09263W7L', 'EOG09262LYR', 'EOG09265PQX', 'EOG09263QH4', 'EOG09260DXP', 'EOG09260WU6', 'EOG09263NE7', 'EOG09265G9U', 'EOG0926388H', 'EOG0926425H', 'EOG09264HTG', 'EOG09260EAZ', 'EOG0926357F', 'EOG09262JWJ', 'EOG092608RH', 'EOG092629WA', 'EOG092657UN', 'EOG09265PUI', 'EOG0926419M', 'EOG09264JHE', 'EOG09263OQH', 'EOG092638CT', 'EOG09262CBI', 'EOG09262X01', 'EOG092640BS', 'EOG09264DY4', 'EOG09264Y0W', 'EOG092619VG', 'EOG092651FJ', 'EOG09261LPY', 'EOG09261OXD', 'EOG09262ESR', 'EOG0926251E', 'EOG0926310O', 'EOG09264T8I', 'EOG092602FH', 'EOG092607OQ', 'EOG09265NHW', 'EOG09264331', 'EOG09261666', 'EOG09260LRX', 'EOG09260A27', 'EOG09262N10', 'EOG09261B18', 'EOG09260SAH', 'EOG09260ERO', 'EOG09261Y04', 'EOG09261EU7', 'EOG09263EVJ', 'EOG09263MEM', 'EOG09260274', 'EOG09264OYZ', 'EOG09264DT4', 'EOG09263OZR', 'EOG09261W90', 'EOG0926347W', 'EOG09264NEF', 'EOG09264LC7', 'EOG09263FR7', 'EOG09260AQB', 'EOG0926306O', 'EOG09260QVP', 'EOG09261JUE', 'EOG09261I1G', 'EOG09264XOZ', 'EOG09260SIZ', 'EOG09264LBC', 'EOG09262V8E', 'EOG09262GXD', 'EOG09263C4C', 'EOG09260RRC', 'EOG092640WA', 'EOG09263A5D', 'EOG09265313', 'EOG092632WW', 'EOG09263U08', 'EOG09265SHM', 'EOG09260SL3', 'EOG092619GP', 'EOG09263690', 'EOG09263ULA', 'EOG09264RIE', 'EOG09262CMP', 'EOG0926073O', 'EOG09264NJ1', 'EOG09263OAE', 'EOG09263BE5', 'EOG09260RS7', 'EOG09260NY2', 'EOG09261O7R', 'EOG092653YS', 'EOG092657YR', 'EOG09260WUA', 'EOG09262JZW', 'EOG09263LNF', 'EOG09264THP', 'EOG09260Z3X', 'EOG0926115P', 'EOG09261WVT', 'EOG09262E4Q', 'EOG09265I60', 'EOG09262DUV', 'EOG09261C0G', 'EOG09261XNJ', 'EOG092658X5', 'EOG092658CI', 'EOG09263A3Y', 'EOG09263IQ5', 'EOG092654LJ', 'EOG09260KGS', 'EOG09262MXH', 'EOG092611HB', 'EOG09263J6Z', 'EOG09260BRA', 'EOG09264903', 'EOG09262GVX', 'EOG09263R4M', 'EOG09264IIZ', 'EOG09262NNS', 'EOG092606AD', 'EOG09263ZW6', 'EOG09263JTO', 'EOG092651K1', 'EOG09263Q8J', 'EOG09261X9E', 'EOG09262PAY', 'EOG09262CUO', 'EOG09261B3Q', 'EOG09263L9T', 'EOG09260W9L', 'EOG09263X1F', 'EOG09263YFX', 'EOG09260DUR', 'EOG09261DW8', 'EOG092654VM', 'EOG09260NJW', 'EOG09260JTZ', 'EOG09263YFH', 'EOG09260JED', 'EOG092613QA', 'EOG09263KB4', 'EOG09262GLP', 'EOG09265GGX', 'EOG092625OH', 'EOG09265KSE', 'EOG09262FE3', 'EOG09264I14', 'EOG09264L0C', 'EOG09263E5F', 'EOG0926448Q', 'EOG09264KIV', 'EOG092645G0', 'EOG09261JCQ', 'EOG09265FI4', 'EOG09265KPR', 'EOG09260GIX', 'EOG09264904', 'EOG09260P2K', 'EOG09262WXK', 'EOG09264COX', 'EOG09260SRF', 'EOG09265IBC', 'EOG09264I9J', 'EOG092656JA', 'EOG0926213Z', 'EOG092635YY', 'EOG09264AWW', 'EOG09264V2U', 'EOG092645L9', 'EOG092624SJ', 'EOG09260075', 'EOG09260AZA', 'EOG092654XA', 'EOG092620CR', 'EOG09263X0V', 'EOG092655M5', 'EOG092648Q0', 'EOG09260R84', 'EOG092626HU', 'EOG09263XVS', 'EOG092600NM', 'EOG092659OC', 'EOG09263BG5', 'EOG09264T0J', 'EOG09263GSR', 'EOG092652YI', 'EOG092654QZ', 'EOG09264ZXA', 'EOG09263SFX', 'EOG09262XMN', 'EOG09262645', 'EOG09264CP4', 'EOG092600S9', 'EOG09264W46', 'EOG09262XGK', 'EOG09263E9V', 'EOG09262UTQ', 'EOG09264NNY', 'EOG09265KNL', 'EOG09265PJ3', 'EOG09260HS3', 'EOG092605KN', 'EOG092634B5', 'EOG09263HD8', 'EOG0926142Y', 'EOG09261YLQ', 'EOG09262WHU', 'EOG09265E6R', 'EOG09261660', 'EOG092619RJ', 'EOG09264XF3', 'EOG09263E87', 'EOG092649XV', 'EOG09264WF4', 'EOG09261HQU', 'EOG09261MPU', 'EOG09260V8Q', 'EOG09265G5C', 'EOG0926049A', 'EOG092643Y5', 'EOG0926079Q', 'EOG09262DPL', 'EOG092621GA', 'EOG09262XRU', 'EOG09263PXH', 'EOG092624X0', 'EOG092652TN', 'EOG09260OE9', 'EOG09261NW2', 'EOG092653KS', 'EOG09260KNI', 'EOG09265BE5', 'EOG09264G7L', 'EOG09261F73', 'EOG09264LJU', 'EOG092639H5', 'EOG09264RW6', 'EOG092620E4', 'EOG09263GIG', 'EOG09260OLB', 'EOG09263JW5', 'EOG092620U5', 'EOG09262QTY', 'EOG092606CY', 'EOG09264OBA', 'EOG092653SU', 'EOG092643VB', 'EOG09260N2T', 'EOG092608AE', 'EOG0926499W', 'EOG0926049S', 'EOG09262D4G', 'EOG09264YEG', 'EOG09265JVH', 'EOG09265BTC', 'EOG092644O2', 'EOG09263CUQ', 'EOG0926004Z', 'EOG09261127', 'EOG09262QJW', 'EOG09263RF8', 'EOG09264P4J', 'EOG09265DRM', 'EOG09260JT9', 'EOG09260A98', 'EOG09265DWT', 'EOG092615SM', 'EOG09264873', 'EOG09263CGP', 'EOG09263L52', 'EOG0926195C', 'EOG09260OQ8', 'EOG092602OP', 'EOG09262A65', 'EOG09261OIA', 'EOG09260GR8', 'EOG092646EZ', 'EOG09260W52', 'EOG092600T4', 'EOG09260B3H', 'EOG09264XM2', 'EOG092644ZU', 'EOG09264R0D', 'EOG09261B14', 'EOG09260J53', 'EOG092647L7', 'EOG092632FC', 'EOG09261476', 'EOG09261S7X', 'EOG09262BCZ', 'EOG09264YIJ', 'EOG0926386D', 'EOG092620IA', 'EOG0926384F', 'EOG092640IZ', 'EOG0926436T', 'EOG09264V30', 'EOG09262D0D', 'EOG092624RX', 'EOG09264G4X', 'EOG09263SZM', 'EOG09260RRN', 'EOG09263AZP', 'EOG09261FAB', 'EOG092646VF', 'EOG09263HED', 'EOG09261W2O', 'EOG09264CND', 'EOG0926390Q', 'EOG09261K9J', 'EOG09264BIX', 'EOG092617AN', 'EOG09260JJW', 'EOG09262ZZ8', 'EOG09264IQ7', 'EOG092644Z6', 'EOG09261JR0', 'EOG092605FC', 'EOG09263CLY', 'EOG092643NE', 'EOG092652Y6', 'EOG0926213Q', 'EOG092610ZY', 'EOG09264IDN', 'EOG092643JW', 'EOG09260JNY', 'EOG0926477X', 'EOG09265GYD', 'EOG092605QM', 'EOG09262KXK', 'EOG09263J3H', 'EOG09264HOY', 'EOG092617RN', 'EOG09261CXQ', 'EOG09262E3Q', 'EOG092614UB', 'EOG092652NQ', 'EOG092627F1', 'EOG09263S2P', 'EOG092612MY', 'EOG09262SWJ', 'EOG09264W71', 'EOG09264PD5', 'EOG09261Q5L', 'EOG09260KWP', 'EOG09260RGH', 'EOG092634G9', 'EOG09263RDD', 'EOG09264B74', 'EOG09261H5E', 'EOG09262YQG', 'EOG09260S5R', 'EOG09261UMG', 'EOG09264TQ5', 'EOG092603JK', 'EOG09264F1U', 'EOG09260FL2', 'EOG09261S0S', 'EOG09263DFA', 'EOG0926077L', 'EOG09260LI6', 'EOG09263IT0', 'EOG09260W8D', 'EOG09264IG5', 'EOG09261EMF', 'EOG09262A8G', 'EOG09260Y2Q', 'EOG09265D7J', 'EOG09260Z5E', 'EOG09261031', 'EOG092621YU', 'EOG092630MJ', 'EOG092634M1', 'EOG09263FAK', 'EOG09261N64', 'EOG09260NE0', 'EOG09262X74', 'EOG09260BSW', 'EOG092609YT', 'EOG09263X4B', 'EOG09264CP8', 'EOG092638XA', 'EOG09264OXC', 'EOG092658WS', 'EOG09260S3L', 'EOG09262H50', 'EOG092608AV', 'EOG09263RY2', 'EOG092631QQ', 'EOG09260TDY', 'EOG09262L1P', 'EOG092656RK', 'EOG09263Z41', 'EOG092636Y6', 'EOG09264DMU', 'EOG09261NLR', 'EOG09262X8R', 'EOG09264SUZ', 'EOG092657H8', 'EOG09264OM7', 'EOG0926591L', 'EOG09260FFP', 'EOG09263H5H', 'EOG09264B3O', 'EOG09264JW6', 'EOG09261YV6', 'EOG09262E7W', 'EOG09261A3K', 'EOG092646PE', 'EOG09260B3X', 'EOG09260JPV', 'EOG0926312D', 'EOG09260C2V', 'EOG09264VRO', 'EOG092628LW', 'EOG09264O2D', 'EOG09260LJ8', 'EOG092609JB', 'EOG09262HA6', 'EOG09264DOU', 'EOG09260U6R', 'EOG09261PJZ', 'EOG09260EPQ', 'EOG09261UPM', 'EOG092649QJ', 'EOG09261DJC', 'EOG09260JO5', 'EOG09263GUT', 'EOG09264GMT', 'EOG09264ENO', 'EOG09263WB5', 'EOG09260K4V', 'EOG09261VI3', 'EOG09260K24', 'EOG09260XOG', 'EOG09265BTA', 'EOG092628HC', 'EOG09264DYD', 'EOG09262H1X', 'EOG09264DJ8', 'EOG09264XKX', 'EOG092600W1', 'EOG09262VOC', 'EOG09261LV7', 'EOG092652KR', 'EOG09261VUC', 'EOG09262KZ3', 'EOG092631UM', 'EOG09262IZO', 'EOG09262UB3', 'EOG09262ILV', 'EOG09261MOX', 'EOG09263X22', 'EOG09263FKC', 'EOG09260WGT', 'EOG09264X41', 'EOG09263KEE', 'EOG09264JQ1', 'EOG09260EQD', 'EOG09261V9P', 'EOG092656IY', 'EOG09265CQO', 'EOG09264XTW', 'EOG09260H81', 'EOG09263ZSC', 'EOG092655L0', 'EOG092648XW', 'EOG09264ZDJ', 'EOG09264LH2', 'EOG09263FTE', 'EOG09265A4E', 'EOG092604ML', 'EOG092615IE', 'EOG09262914', 'EOG09263HYJ', 'EOG09262Q8D', 'EOG09263JFQ', 'EOG09264C3V', 'EOG092612LP', 'EOG09260NC6', 'EOG09264E6Z', 'EOG09264V51', 'EOG092643QM', 'EOG09262Z0M', 'EOG09265AT5', 'EOG09264MZ1', 'EOG092606AJ', 'EOG09264829', 'EOG09264SQJ', 'EOG0926131E', 'EOG09260MRU', 'EOG09263274', 'EOG0926423H', 'EOG09262B1U', 'EOG09260VCG', 'EOG09264D7Y', 'EOG09264L6D', 'EOG09264BJC', 'EOG09262MFL', 'EOG09263OWL', 'EOG092645U1', 'EOG09260NWN', 'EOG0926025H', 'EOG09262DC1', 'EOG09262K8C', 'EOG092611G7', 'EOG09260LBU', 'EOG09262JZK', 'EOG092648U2', 'EOG09264OBO', 'EOG09263EBB', 'EOG09262IZ6', 'EOG09264G1F', 'EOG09260XQV', 'EOG09262QRH', 'EOG09264A8D', 'EOG09261B6Y', 'EOG09265DDU', 'EOG09265GXF', 'EOG09260SJV', 'EOG09265E8A', 'EOG09265H9T', 'EOG09261M78', 'EOG09265JA7', 'EOG0926137U', 'EOG09261FM4', 'EOG09261QXC', 'EOG09264UJF', 'EOG09264XT5', 'EOG09261MMR', 'EOG09262PMC', 'EOG09262HQM', 'EOG09263JZO', 'EOG09265ANI', 'EOG09261QR8', 'EOG09263XZN', 'EOG09260EOI', 'EOG09262N0U', 'EOG09262PIH', 'EOG092652ZZ', 'EOG09262A6N', 'EOG092642I5', 'EOG09260VTA', 'EOG09260E2O', 'EOG092600T9', 'EOG09265K60', 'EOG09263LR1', 'EOG092614E6', 'EOG09262M2J', 'EOG0926140Q', 'EOG092621CK', 'EOG092619EA', 'EOG09264LKR', 'EOG092621CP', 'EOG09262M7B', 'EOG09260779', 'EOG09262N47', 'EOG09265RGI', 'EOG09260XL0', 'EOG09261ZJR', 'EOG09261IEH', 'EOG09265C25', 'EOG09264RBX', 'EOG09263EDC', 'EOG092629U5', 'EOG09265822', 'EOG09265BBJ', 'EOG09261404', 'EOG09263RW3', 'EOG09260375', 'EOG09261LEU', 'EOG09262341', 'EOG092631IR', 'EOG092618M2', 'EOG09264U78', 'EOG09263CG4', 'EOG092645LS', 'EOG09262VVF', 'EOG09261XAF', 'EOG09261N20', 'EOG09260UXC', 'EOG09265HPJ', 'EOG09261V2P', 'EOG09262516', 'EOG09260LS9', 'EOG09260KCB', 'EOG09260V5Q', 'EOG09260FKU', 'EOG09264OML', 'EOG09262V9N', 'EOG09265CN0', 'EOG09260HPO', 'EOG092644DY', 'EOG09260XH5', 'EOG09262F7P', 'EOG09263A64', 'EOG0926505R', 'EOG09264RJ7', 'EOG09261QR5', 'EOG09263ZBJ', 'EOG09261PUF', 'EOG09262SR7', 'EOG09260RNZ', 'EOG09260WYQ', 'EOG09260BYL', 'EOG092644X6', 'EOG09263G3M', 'EOG0926510L', 'EOG092606WZ', 'EOG092638EN', 'EOG0926489S', 'EOG09264JK1', 'EOG092608T8', 'EOG09264RR2', 'EOG09264O9F', 'EOG092600X0', 'EOG09264EGS', 'EOG092634ZL', 'EOG09264ABT', 'EOG09264FYY', 'EOG09260M87', 'EOG09264B2P', 'EOG092630YS', 'EOG09264HN5', 'EOG092644TW', 'EOG09262G8Y', 'EOG09263MNN', 'EOG09263BGW', 'EOG09261ONU', 'EOG09264L06', 'EOG09261KHB', 'EOG092603D0', 'EOG09262J7K', 'EOG09261M4S', 'EOG09262WSH', 'EOG09263Z8I', 'EOG09265EOF', 'EOG092604MJ', 'EOG09261V03', 'EOG09260HSP', 'EOG092608WU', 'EOG09260NHB', 'EOG092638RC', 'EOG0926158Y', 'EOG09260GF5', 'EOG09263OCO', 'EOG09260AEC', 'EOG092610LQ', 'EOG092603SM', 'EOG09260DFH', 'EOG092624UF', 'EOG09265I7S', 'EOG092614DJ', 'EOG09263CG2', 'EOG09263I7I', 'EOG09260N53', 'EOG09264BWL', 'EOG09260J8F', 'EOG09264T1U', 'EOG09265KF9', 'EOG09264KTU', 'EOG092613R2', 'EOG09264TJN', 'EOG0926354S', 'EOG0926420U', 'EOG09261ZXZ', 'EOG092654O3', 'EOG09265FGA', 'EOG09260MBW', 'EOG09264CST', 'EOG092619NK', 'EOG09263WZ2', 'EOG09264UD3', 'EOG092650I8', 'EOG09261FH7', 'EOG09261225', 'EOG09264KK7', 'EOG09264YKY', 'EOG09262P1W', 'EOG09262C5Z', 'EOG09262KUJ', 'EOG09264L3W', 'EOG09264G04', 'EOG09260WUS', 'EOG09264XPY', 'EOG09264FVQ', 'EOG09260DUW', 'EOG092653LT', 'EOG09265LEG', 'EOG092656CM', 'EOG09264Z1B', 'EOG09263TQ5', 'EOG092658ZO', 'EOG09260OM6', 'EOG092635SS', 'EOG09261KRX', 'EOG092647CM', 'EOG09265BG5', 'EOG09264IKZ', 'EOG09261UOJ', 'EOG09263UWJ', 'EOG09260HMA', 'EOG09262Q6S', 'EOG09260APA', 'EOG09264MN3', 'EOG09265591', 'EOG09265ER6', 'EOG09262I0R', 'EOG09260931', 'EOG092633QB', 'EOG09261RFF', 'EOG092603KJ', 'EOG09262BHE', 'EOG09262IP2', 'EOG09264DIM', 'EOG09262E98', 'EOG092649VA', 'EOG09264YHS', 'EOG09260PJ9', 'EOG092628SP', 'EOG09264K2W', 'EOG09264IV9', 'EOG09261W1K', 'EOG09260JDM', 'EOG09260H6E', 'EOG092613UB', 'EOG09264P74', 'EOG09261V87', 'EOG092624JL', 'EOG09262O0R', 'EOG09262PZ9', 'EOG09264GQZ', 'EOG09261HB6', 'EOG09264IOS', 'EOG09262MOO', 'EOG09261CM0', 'EOG09265K4K', 'EOG09265AL8', 'EOG09261EY9', 'EOG092651HW', 'EOG09260B65', 'EOG092629WJ', 'EOG092628FW', 'EOG09260OZU', 'EOG09261OKK', 'EOG092604S1', 'EOG092631MU', 'EOG09264USX', 'EOG09260TPT', 'EOG09261G4Z', 'EOG09261RWU', 'EOG09262LQT', 'EOG092605VU'] were not found in the ancestral_variants file INFO Running tblastn, writing output to /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/blast_output/tblastn_awesome_busco_missing_and_frag_rerun.tsv... INFO [tblastn] Warning: [tblastn] Query is Empty! INFO Getting coordinates for candidate regions... INFO ** Step 2/3, current time: 06/29/2022 21:25:01 ** INFO Training Augustus using Single-Copy Complete BUSCOs: INFO 06/29/2022 21:25:01 => Converting predicted genes to short genbank files... INFO 06/29/2022 21:25:07 => All files converted to short genbank files, now running the training scripts... INFO Pre-Augustus scaffold extraction... INFO Re-running Augustus with the new metaparameters, number of target BUSCOs: 1135 INFO 06/29/2022 21:25:09 => 0% of predictions performed (0 to be done) INFO 06/29/2022 21:25:09 => 100% of predictions performed INFO Extracting predicted proteins... INFO ** Step 3/3, current time: 06/29/2022 21:25:09 ** INFO Running HMMER to confirm orthology of predicted proteins: INFO 06/29/2022 21:25:09 => 0% of predictions performed (0 to be done) INFO 06/29/2022 21:25:09 => 100% of predictions performed INFO Results: INFO C:13.5%[S:13.3%,D:0.2%],F:0.1%,M:86.4%,n:1312 INFO 177 Complete BUSCOs (C) INFO 175 Complete and single-copy BUSCOs (S) INFO 2 Complete and duplicated BUSCOs (D) INFO 1 Fragmented BUSCOs (F) INFO 1134 Missing BUSCOs (M) INFO 1312 Total BUSCO groups searched

INFO BUSCO analysis done with WARNING(s). Total running time: 300.09778451919556 seconds INFO Results written in /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/

INFO ** Start a BUSCO 2.0 analysis, current time: 06/29/2022 21:25:30 ** INFO The lineage dataset is: dikarya_odb9 (eukaryota) INFO Mode is: proteins INFO To reproduce this run: python /hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco_augustus.proteins.fasta -o awesome_busco -l /hpc/group/bio1/ian/envs/funannotate_db/dikarya/ -m proteins -c 10 -sp anidulans INFO Check dependencies... INFO Check input file... INFO Temp directory is ./tmp/ INFO Running HMMER on the proteins: INFO 06/29/2022 21:25:30 => 0% of predictions performed (1312 to be done) INFO 06/29/2022 21:25:32 => 10% of predictions performed (134/1312 candidate proteins) INFO 06/29/2022 21:25:33 => 20% of predictions performed (263/1312 candidate proteins) INFO 06/29/2022 21:25:35 => 30% of predictions performed (396/1312 candidate proteins) INFO 06/29/2022 21:25:38 => 40% of predictions performed (525/1312 candidate proteins) INFO 06/29/2022 21:25:41 => 50% of predictions performed (659/1312 candidate proteins) INFO 06/29/2022 21:25:44 => 60% of predictions performed (791/1312 candidate proteins) INFO 06/29/2022 21:25:47 => 70% of predictions performed (922/1312 candidate proteins) INFO 06/29/2022 21:25:52 => 80% of predictions performed (1054/1312 candidate proteins) INFO 06/29/2022 21:25:56 => 90% of predictions performed (1181/1312 candidate proteins) INFO 06/29/2022 21:26:00 => 100% of predictions performed INFO Results: INFO C:13.3%[S:13.3%,D:0.0%],F:0.0%,M:86.7%,n:1312 INFO 175 Complete BUSCOs (C) INFO 175 Complete and single-copy BUSCOs (S) INFO 0 Complete and duplicated BUSCOs (D) INFO 0 Fragmented BUSCOs (F) INFO 1137 Missing BUSCOs (M) INFO 1312 Total BUSCO groups searched

INFO BUSCO analysis done. Total running time: 33.35695195198059 seconds INFO Results written in /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco_proteins/run_awesome_busco/

`

OS/Install Information

Installed using mamba on computing cluster running Red Hat Enterprise Linux 8.

`------------------------------------------------------- Checking dependencies for 1.8.11

You are running Python v 3.8.12. Now checking python packages... biopython: 1.77 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.1.0 numpy: 1.23.0 pandas: 1.4.3 psutil: 5.9.1 requests: 2.28.1 scikit-learn: 1.1.1 scipy: 1.8.1 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000024 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/hpc/group/bio1/ian/envs/funannotate_db $PASAHOME=/hpc/home/idm7/miniconda3/envs/annotate/opt/pasa-2.5.2 $TRINITY_HOME=/hpc/home/idm7/miniconda3/envs/annotate/opt/trinity-2.8.5 $EVM_HOME=/hpc/home/idm7/miniconda3/envs/annotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/hpc/home/idm7/miniconda3/envs/annotate/config/ $GENEMARK_PATH=/hpc/group/bio1/ian/envs/funannotate/gmes_petap All 6 environmental variables are set

Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.1-internal kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.7 proteinortho: 6.1.0 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed`

nextgenusfs commented 2 years ago

Seems like related to these errors, which suggest that BUSCO did not parse the output properly perhaps?

INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092603EH.faa.1):
INFO [hmmersearch] Line 2: illegal character %

This seems like the error you get when you use Augustus 3.4.

So it seems like the Augustus config path ENV variable is still pointing to the conda install $AUGUSTUS_CONFIG_PATH=/hpc/home/idm7/miniconda3/envs/annotate/config/, so how did you "replace" augustus? You should probably uninstall it, ie I think you can use conda remove but need to use --force to only delete that package. After you remove the conda augustus, you can manually add back the location of your system AUGUSTUS_CONFIG_PATH variable. I'd first try to fix that and then rerun funannotate setup because it copies over Augustus parameters files from that location. Maybe its possible that the parameters files for this version from mamba are incompatible with your v3.3.3?

But perhaps the other thing to try would be to downgrade hmmer to the previous versions as maybe that is reason for the failure? I've never seen this be the problem before - seems like most likely augustus (as it nearly always is augustus issues). The other one for awhile was tblastn and multithreaded, but seems like you have older version so that shouldn't be an issue.

IanDMedeiros commented 2 years ago

Thanks for the quick reply—I think the piece I was missing was re-running funannotate setup with the new augustus. (/hpc/home/idm7/miniconda3/envs/annotate/config/ is actually the config directory associated with the new augustus.) With that correction, funannotate predict successfully completed in the test (it is still throwing an error that there are not enough gene models to train augustus, but that is now coming at the very end, after funannotate predict completes). I am now trying the pipeline with actual data.

[editing my comment, I now see that HMMER is there as hmmscan and hmmsearch]

######################################################### Runningfunannotate clean` unit testing: minimap2 mediated assembly duplications Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076 [DOWNLOAD PROGRESS OMITTED FOR SPACE] 6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs

scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039

6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive ######################################################### ######################################################### SUCCESS: funannotate clean test complete. #########################################################

######################################################### Running funannotate mask unit testing: RepeatModeler --> RepeatMasker Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687 [DOWNLOAD PROGRESS OMITTED FOR SPACE] [Jun 30 01:06 AM]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12 [Jun 30 01:06 AM]: Running funanotate v1.8.11 [Jun 30 01:06 AM]: Soft-masking simple repeats with tantan [Jun 30 01:06 AM]: Repeat soft-masking finished: Masked genome: /hpc/group/bio1/ian/envs/test-mask_e3d485dd-d34b-4a2b-b684-f7b62bc53bba/test.masked.fa num scaffolds: 2 assembly size: 1,216,048 bp masked repeats: 50,965 bp (4.19%)


CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 16 ######################################################### ######################################################### SUCCESS: funannotate mask test complete. #########################################################

######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 [DOWNLOAD PROGRESS OMITTED FOR SPACE] ------------------------------------------------------- [Jun 30 01:06 AM]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12 [Jun 30 01:06 AM]: Running funannotate v1.8.11 [Jun 30 01:06 AM]: Skipping CodingQuarry as no --rna_bam passed [Jun 30 01:06 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
glimmerhmm busco
snap busco
[Jun 30 01:06 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Jun 30 01:06 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Jun 30 01:06 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Jun 30 01:06 AM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 [PROGRESS UPDATES OMITTED FOR SPACE] Progress: 99.93% [Jun 30 01:07 AM]: Exonerate finished in 0:00:19: found 1,270 alignments [Jun 30 01:07 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Jun 30 01:11 AM]: 175 valid BUSCO predictions found, validating protein sequences [Jun 30 01:12 AM]: 175 BUSCO predictions validated [Jun 30 01:12 AM]: Running Augustus gene prediction using saccharomyces parameters [PROGRESS UPDATES OMITTED FOR SPACE] Progress: 90.91% [Jun 30 01:13 AM]: 1,485 predictions from Augustus [Jun 30 01:13 AM]: Pulling out high quality Augustus predictions [Jun 30 01:13 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence) [Jun 30 01:13 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Jun 30 01:14 AM]: 1,512 predictions from SNAP [Jun 30 01:14 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Jun 30 01:15 AM]: 1,597 predictions from GlimmerHMM [Jun 30 01:15 AM]: Summary of gene models passed to EVM (weights): [Jun 30 01:15 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [PROGRESS UPDATES OMITTED FOR SPACE] Progress: 97.96% [Jun 30 01:18 AM]: Converting to GFF3 and collecting all EVM results Source Weight Count Augustus 1 1325 Augustus HiQ 2 372
GlimmerHMM 1 1597 snap 1 1512 Total - 4806 [Jun 30 01:18 AM]: 1,654 total gene models from EVM [Jun 30 01:18 AM]: Generating protein fasta files from 1,654 EVM models [Jun 30 01:18 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Jun 30 01:18 AM]: Found 128 gene models to remove: 0 too short; 0 span gaps; 128 transposable elements [Jun 30 01:18 AM]: 1,526 gene models remaining [Jun 30 01:18 AM]: Predicting tRNAs [Jun 30 01:18 AM]: 112 tRNAscan models are valid (non-overlapping) [Jun 30 01:18 AM]: Generating GenBank tbl annotation file [Jun 30 01:18 AM]: Collecting final annotation files for 1,638 total gene models [Jun 30 01:18 AM]: Converting to final Genbank format [Jun 30 01:18 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [Jun 30 01:18 AM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (manual install): funannotate iprscan -i annotate -c 16

Run antiSMASH (optional): funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i annotate --cpus 16 --sbt yourSBTfile.txt

[Jun 30 01:18 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json [Jun 30 01:18 AM]: Add species parameters to database:

funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json


[Jun 30 01:19 AM]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12 [Jun 30 01:19 AM]: Running funannotate v1.8.11 [Jun 30 01:19 AM]: Skipping CodingQuarry as no --rna_bam passed [Jun 30 01:19 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
glimmerhmm busco
snap busco
[Jun 30 01:19 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Jun 30 01:19 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Jun 30 01:19 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Jun 30 01:19 AM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 [PROGRESS UPDATES OMITTED FOR SPACE] Progress: 99.93% [Jun 30 01:19 AM]: Exonerate finished in 0:00:19: found 1,270 alignments [Jun 30 01:19 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Jun 30 01:24 AM]: 175 valid BUSCO predictions found, validating protein sequences [Jun 30 01:24 AM]: 175 BUSCO predictions validated [Jun 30 01:24 AM]: Not enough gene models 175 to train Augustus (200 required), exiting CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 16 --species Awesome testicus ######################################################### ######################################################### SUCCESS: funannotate predict test complete. #########################################################

######################################################### Running funannotate predict BUSCO-mediated training unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 16 --species Awesome busco ######################################################### ######################################################### Traceback (most recent call last): File "/hpc/home/idm7/miniconda3/envs/annotate/bin/funannotate", line 10, in sys.exit(main()) File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/test.py", line 407, in main runBuscoTest(args) File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/test.py", line 200, in runBuscoTest assert 1500 <= countGFFgenes(os.path.join( File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-busco_e3d485dd-d34b-4a2b-b684-f7b62bc53bba/annotate/predict_results/Awesome_busco.gff3' `

IanDMedeiros commented 2 years ago

No, still failing on real data...


[Jun 30 01:32 AM]: OS: CentOS Stream 8, 12 cores, ~ 33 GB RAM. Python: 3.8.12 [Jun 30 01:32 AM]: Running funannotate v1.8.11 [Jun 30 01:32 AM]: Skipping CodingQuarry as no --rna_bam passed [Jun 30 01:32 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
glimmerhmm busco
snap busco
[Jun 30 01:32 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Jun 30 01:32 AM]: Genome loaded: 74 scaffolds; 26,734,419 bp; 2.37% repeats masked [Jun 30 01:32 AM]: Mapping 554,696 proteins to genome using diamond and exonerate [Jun 30 01:35 AM]: Found 293,585 preliminary alignments with diamond in 0:01:35 --> generated FASTA files for exonerate in 0:00:58 [PROGRESS OMITTED FOR SPACE] [Jun 30 02:34 AM]: Exonerate finished in 0:59:08: found 1,406 alignments [Jun 30 02:35 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Jun 30 02:45 AM]: 0 valid BUSCO predictions found, validating protein sequences Traceback (most recent call last): File "/hpc/home/idm7/miniconda3/envs/annotate/bin/funannotate", line 10, in sys.exit(main()) File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/predict.py", line 1354, in main buscoProtComplete = lib.getCompleteBuscos(buscoProtOutput, File "/hpc/home/idm7/miniconda3/envs/annotate/lib/python3.8/site-packages/funannotate/library.py", line 5370, in getCompleteBuscos with open(input, 'r') as infile: FileNotFoundError: [Errno 2] No such file or directory: 'annotated/predict_misc/busco_proteins/run_eurotiomycetes_af_10-2_af_10-2/full_table_eurotiomycetes_af_10-2_af_10-2.tsv'

nextgenusfs commented 2 years ago

The test is still somewhat failing for the same reason -- a successful test should produce > 200 models from BUSCO. So I think it is still related to augustus.

FYI you can use --no-progress to suppress that annoying progress meter on HPC.

Here is my successful test:

$ ./funannotate_dev/funannotate-docker test -t predict --cpus 4
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 4 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Jun 30 09:01 AM]: OS: Debian GNU/Linux 10, 4 cores, ~ 8 GB RAM. Python: 3.8.13
[Jun 30 09:01 AM]: Running funannotate v1.8.12
[Jun 30 09:01 AM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[Jun 30 09:01 AM]: Skipping CodingQuarry as no --rna_bam passed
[Jun 30 09:01 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   busco          
  snap         busco          
[Jun 30 09:01 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jun 30 09:01 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Jun 30 09:01 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Jun 30 09:01 AM]: Found 1,505 preliminary alignments with diamond in 0:00:02 --> generated FASTA files for exonerate in 0:00:00
[Jun 30 09:02 AM]: Exonerate finished in 0:00:34: found 1,270 alignments
[Jun 30 09:02 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Jun 30 09:14 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Jun 30 09:15 AM]: 370 BUSCO predictions validated
[Jun 30 09:15 AM]: Running Augustus gene prediction using saccharomyces parameters
[Jun 30 09:18 AM]: 1,485 predictions from Augustus
[Jun 30 09:18 AM]: Pulling out high quality Augustus predictions
[Jun 30 09:18 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Jun 30 09:18 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Jun 30 09:20 AM]: 1,362 predictions from SNAP
[Jun 30 09:20 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Jun 30 09:23 AM]: 1,769 predictions from GlimmerHMM
[Jun 30 09:23 AM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GlimmerHMM     1        1769 
  snap           1        1362 
  Total          -        4828 
[Jun 30 09:23 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Jun 30 09:34 AM]: Converting to GFF3 and collecting all EVM results
[Jun 30 09:34 AM]: 1,695 total gene models from EVM
[Jun 30 09:34 AM]: Generating protein fasta files from 1,695 EVM models
[Jun 30 09:34 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Jun 30 09:34 AM]: Found 137 gene models to remove: 0 too short; 0 span gaps; 137 transposable elements
[Jun 30 09:34 AM]: 1,558 gene models remaining
[Jun 30 09:34 AM]: Predicting tRNAs
[Jun 30 09:35 AM]: 112 tRNAscan models are valid (non-overlapping)
[Jun 30 09:35 AM]: Generating GenBank tbl annotation file
[Jun 30 09:35 AM]: Collecting final annotation files for 1,670 total gene models
[Jun 30 09:35 AM]: Converting to final Genbank format
[Jun 30 09:35 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Jun 30 09:35 AM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install): 
funannotate iprscan -i annotate -c 4

Run antiSMASH (optional): 
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: 
funannotate annotate -i annotate --cpus 4 --sbt yourSBTfile.txt
-------------------------------------------------------

[Jun 30 09:35 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json
[Jun 30 09:35 AM]: Add species parameters to database:

  funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

#########################################################
SUCCESS: `funannotate predict` test complete.
#########################################################

And then the busco.log:

INFO    ****************** Start a BUSCO 2.0 analysis, current time: 06/30/2022 09:02:33 ******************
INFO    The lineage dataset is: dikarya_odb9 (eukaryota)
INFO    Mode is: genome
INFO    Maximum number of regions limited to: 3
INFO    To reproduce this run: python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/genome.softmasked.fa -o saccharomyces -l /opt/databases/dikarya/ -m genome -c 4 -sp anidulans
INFO    Check dependencies...
INFO    Check input file...
INFO    Temp directory is ./tmp/

INFO    ****** Phase 1 of 2, initial predictions ******
INFO    ****** Step 1/3, current time: 06/30/2022 09:02:33 ******
INFO    Create blast database...
INFO    [makeblastdb]   Building a new DB, current time: 06/30/2022 09:02:33
INFO    [makeblastdb]   New DB name:   /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco/tmp/saccharomyces_3039243825
INFO    [makeblastdb]   New DB title:  /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/genome.softmasked.fa
INFO    [makeblastdb]   Sequence type: Nucleotide
INFO    [makeblastdb]   Keep Linkouts: T
INFO    [makeblastdb]   Keep MBits: T
INFO    [makeblastdb]   Maximum file size: 1000000000B
INFO    [makeblastdb]   Adding sequences from FASTA; added 6 sequences in 0.0503011 seconds.
INFO    Running tblastn, writing output to /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco/run_saccharomyces/blast_output/tblastn_saccharomyces.tsv...
INFO    ****** Step 2/3, current time: 06/30/2022 09:02:42 ******
INFO    Getting coordinates for candidate regions...
INFO    Pre-Augustus scaffold extraction...
INFO    Running Augustus prediction using anidulans as species:
INFO    [augustus] Please find all logs related to Augustus here: /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco/run_saccharomyces/augustus_output/augustus.log
INFO    06/30/2022 09:02:42 =>  0% of predictions performed (743 to be done)
INFO    06/30/2022 09:04:21 =>  10% of predictions performed (75/743 candidate regions)
INFO    06/30/2022 09:05:44 =>  20% of predictions performed (149/743 candidate regions)
INFO    06/30/2022 09:07:25 =>  30% of predictions performed (223/743 candidate regions)
INFO    06/30/2022 09:08:50 =>  40% of predictions performed (298/743 candidate regions)
INFO    06/30/2022 09:09:50 =>  50% of predictions performed (372/743 candidate regions)
INFO    06/30/2022 09:10:44 =>  60% of predictions performed (446/743 candidate regions)
INFO    06/30/2022 09:11:28 =>  70% of predictions performed (521/743 candidate regions)
INFO    06/30/2022 09:12:15 =>  80% of predictions performed (595/743 candidate regions)
INFO    06/30/2022 09:12:51 =>  90% of predictions performed (669/743 candidate regions)
INFO    06/30/2022 09:13:22 =>  100% of predictions performed
INFO    Extracting predicted proteins...
INFO    ****** Step 3/3, current time: 06/30/2022 09:13:42 ******
INFO    Running HMMER to confirm orthology of predicted proteins:
INFO    06/30/2022 09:13:42 =>  0% of predictions performed (686 to be done)
INFO    06/30/2022 09:13:43 =>  10% of predictions performed (69/686 candidate proteins)
INFO    06/30/2022 09:13:44 =>  20% of predictions performed (139/686 candidate proteins)
INFO    06/30/2022 09:13:45 =>  30% of predictions performed (208/686 candidate proteins)
INFO    06/30/2022 09:13:46 =>  40% of predictions performed (277/686 candidate proteins)
INFO    06/30/2022 09:13:48 =>  50% of predictions performed (346/686 candidate proteins)
INFO    06/30/2022 09:13:49 =>  60% of predictions performed (412/686 candidate proteins)
INFO    06/30/2022 09:13:51 =>  70% of predictions performed (481/686 candidate proteins)
INFO    06/30/2022 09:13:53 =>  80% of predictions performed (549/686 candidate proteins)
INFO    06/30/2022 09:13:55 =>  90% of predictions performed (619/686 candidate proteins)
INFO    06/30/2022 09:13:57 =>  100% of predictions performed
INFO    Results:
INFO    C:28.9%[S:28.4%,D:0.5%],F:0.8%,M:70.3%,n:1312
INFO    380 Complete BUSCOs (C)
INFO    373 Complete and single-copy BUSCOs (S)
INFO    7 Complete and duplicated BUSCOs (D)
INFO    10 Fragmented BUSCOs (F)
INFO    922 Missing BUSCOs (M)
INFO    1312 Total BUSCO groups searched

INFO    ****** Phase 2 of 2, predictions using species specific training ******
INFO    ****** Step 1/3, current time: 06/30/2022 09:13:58 ******
INFO    Extracting missing and fragmented buscos from the ancestral_variants file...
WARNING The busco id(s) ['EOG092647L7', 'EOG09264L0C', 'EOG09262X7T', 'EOG09264RR2', 'EOG09264RIE', 'EOG09265SHM', 'EOG09262TO9', 'EOG0926448Q', 'EOG092641G3', 'EOG09262O0R', 'EOG092610VI', 'EOG09261V2P', 'EOG0926400M', 'EOG09261DG0', 'EOG09264881', 'EOG0926457R', 'EOG09264331', 'EOG09260LRX', 'EOG092643IE', 'EOG09260JNY', 'EOG09260Z5E', 'EOG0926458I', 'EOG092657UN', 'EOG09264Z3D', 'EOG092629RT', 'EOG092612XD', 'EOG09261QR8', 'EOG09265FTY', 'EOG09260JO5', 'EOG09262V8E', 'EOG092630YS', 'EOG092644ZU', 'EOG092621CK', 'EOG09262FE3', 'EOG092624X0', 'EOG09261NLY', 'EOG09260WCZ', 'EOG09260WGT', 'EOG09262KUJ', 'EOG09262N0U', 'EOG09262SI7', 'EOG092645OU', 'EOG092632TF', 'EOG09261JCQ', 'EOG09264XTW', 'EOG09261XNJ', 'EOG09264XT5', 'EOG09263IT0', 'EOG09264U78', 'EOG09262NB1', 'EOG092610TN', 'EOG09264ABT', 'EOG092610ZY', 'EOG09263Q8J', 'EOG09264FXB', 'EOG09261W2O', 'EOG09263G4R', 'EOG0926510L', 'EOG09264OYZ', 'EOG092640WA', 'EOG092602FH', 'EOG09262UTQ', 'EOG092608WI', 'EOG09260EQD', 'EOG09263WM5', 'EOG09263FAK', 'EOG092605ZA', 'EOG09263FKC', 'EOG09263YBT', 'EOG09263R4M', 'EOG092659DX', 'EOG09261QXC', 'EOG092605VL', 'EOG09263ZSC', 'EOG09262WXK', 'EOG092644DY', 'EOG09260LVD', 'EOG092608AV', 'EOG09261727', 'EOG09261OXD', 'EOG09260EE7', 'EOG092611HB', 'EOG09262N3C', 'EOG09264DIM', 'EOG092604MJ', 'EOG09264HOY', 'EOG09264XPY', 'EOG092626HU', 'EOG092650I8', 'EOG0926315C', 'EOG09260DXP', 'EOG09263BG5', 'EOG092616QN', 'EOG09264OBA', 'EOG09264BIX', 'EOG092644N1', 'EOG09264DT4', 'EOG09262N47', 'EOG09263CLY', 'EOG09265I7S', 'EOG09262W7C', 'EOG092654LJ', 'EOG09260K29', 'EOG09264LH2', 'EOG09265822', 'EOG09263BE5', 'EOG09261OKK', 'EOG09260NE0', 'EOG09262D0D', 'EOG09262SWJ', 'EOG09261P7G', 'EOG0926306O', 'EOG0926140Q', 'EOG09261H5E', 'EOG092615IE', 'EOG092650VI', 'EOG09260BFE', 'EOG09261VI3', 'EOG092635DF', 'EOG09264SUZ', 'EOG09260VYK', 'EOG09264L06', 'EOG09264904', 'EOG09261S7X', 'EOG092631MU', 'EOG09261I0F', 'EOG09260LJ8', 'EOG09262MXH', 'EOG09263RW3', 'EOG09260HPO', 'EOG092658ZO', 'EOG092610KH', 'EOG09263BDA', 'EOG09260OZU', 'EOG09263760', 'EOG09263KVG', 'EOG092615SM', 'EOG092629WA', 'EOG092652Y6', 'EOG09263Z8I', 'EOG0926115P', 'EOG09264398', 'EOG09262J7K', 'EOG09263TQ5', 'EOG09260W8D', 'EOG09260J97', 'EOG092600W1', 'EOG092604ZZ', 'EOG09262D4G', 'EOG09261UPM', 'EOG09264873', 'EOG09261LV7', 'EOG092605T6', 'EOG09260NAN', 'EOG09263EQZ', 'EOG09261V87', 'EOG09260EAZ', 'EOG09262528', 'EOG09260VEY', 'EOG09260MRU', 'EOG09260RRC', 'EOG09260APA', 'EOG092653VU', 'EOG09262YP5', 'EOG09261KRX', 'EOG092628SP', 'EOG09262YQG', 'EOG09263KDI', 'EOG092634ZL', 'EOG09264HU0', 'EOG0926357F', 'EOG09261TEQ', 'EOG09260AQB', 'EOG09265040', 'EOG092629ZN', 'EOG092605KN', 'EOG09262K8C', 'EOG09260OLB', 'EOG09261660', 'EOG092628HC', 'EOG09264G7L', 'EOG0926248P', 'EOG0926112A', 'EOG09260QNB', 'EOG09264ZDZ', 'EOG09262KVB', 'EOG09263SZM', 'EOG092656RK', 'EOG09263RY2', 'EOG0926158Y', 'EOG092654XA', 'EOG0926436T', 'EOG09264W71', 'EOG092619MJ', 'EOG09260TWS', 'EOG09260BYL', 'EOG09262GVX', 'EOG09264PE5', 'EOG09261CXQ', 'EOG09265GXF', 'EOG09260E2O', 'EOG09261476', 'EOG092649VG', 'EOG09262QTY', 'EOG092645U1', 'EOG092652ZZ', 'EOG09260JTZ', 'EOG092625AX', 'EOG09263CUQ', 'EOG09261Y04', 'EOG09263D2P', 'EOG0926049A', 'EOG09265G5K', 'EOG09264EGS', 'EOG09265BG5', 'EOG092619GP', 'EOG09264J8E', 'EOG09260RVQ', 'EOG09262GWQ', 'EOG09261801', 'EOG092658SK', 'EOG092644X6', 'EOG09264JQ1', 'EOG09265HEP', 'EOG092604KQ', 'EOG09260XQV', 'EOG09262ZZ8', 'EOG09265DWT', 'EOG09264U81', 'EOG09260BSW', 'EOG09260OCI', 'EOG09264W1U', 'EOG09262C5Z', 'EOG09262B1U', 'EOG092654VM', 'EOG09262MJW', 'EOG09261404', 'EOG09261I1I', 'EOG09261JR0', 'EOG0926025H', 'EOG092648LP', 'EOG09264R4M', 'EOG09264KTU', 'EOG09264T1U', 'EOG09264O2D', 'EOG09263EDC', 'EOG09264GMT', 'EOG092629FB', 'EOG092643Y5', 'EOG09264I6B', 'EOG092600SK', 'EOG09265KNL', 'EOG09264RJ7', 'EOG09264CA0', 'EOG09262645', 'EOG092617S2', 'EOG09260V5Q', 'EOG09260Z3X', 'EOG09262UB3', 'EOG092603SM', 'EOG0926195C', 'EOG09261QR5', 'EOG0926307V', 'EOG092609O9', 'EOG09264OM7', 'EOG092654KW', 'EOG092634B5', 'EOG092658WY', 'EOG09262POL', 'EOG0926229Z', 'EOG09260274', 'EOG09264UJF', 'EOG09264L3W', 'EOG09260KM4', 'EOG09260JPV', 'EOG0926514P', 'EOG09260XH5', 'EOG09261RWU', 'EOG0926079Q', 'EOG092647CM', 'EOG09265552', 'EOG09262SR7', 'EOG092619RJ', 'EOG092652TN', 'EOG09263A3Y', 'EOG09260H6E', 'EOG092602OP', 'EOG09260RS7', 'EOG09260RGH', 'EOG09263LNF', 'EOG092645LS', 'EOG09263C55', 'EOG09262CBI', 'EOG09263FR7', 'EOG092620IA', 'EOG092608AE', 'EOG09260SJV', 'EOG092600SD', 'EOG09264OXC', 'EOG09260W52', 'EOG09264USX', 'EOG092605VU', 'EOG09264XYD', 'EOG09260DUR', 'EOG092604A0', 'EOG092649QJ', 'EOG092605FC', 'EOG09265DRM', 'EOG09261OXV', 'EOG092659OC', 'EOG092644WX', 'EOG09262U7S', 'EOG092618M2', 'EOG09264P74', 'EOG09262GNE', 'EOG09265F2Y', 'EOG09264LC7', 'EOG09263ULA', 'EOG09261PUF', 'EOG092638XA', 'EOG09264IDN', 'EOG09261HB6', 'EOG09265K4K', 'EOG092608ZS', 'EOG09260KUC', 'EOG092615CC', 'EOG09265QTV', 'EOG0926248W', 'EOG09262WHU', 'EOG0926213Q', 'EOG09264TQ5', 'EOG09260EPS', 'EOG09262GLP', 'EOG092646PE', 'EOG092624RX', 'EOG0926073O', 'EOG09264GQZ', 'EOG092638RC', 'EOG092634MM', 'EOG09261NW2', 'EOG09262R8O', 'EOG09264OML', 'EOG0926506Z', 'EOG09264G1I', 'EOG09260FL2', 'EOG09265CN0', 'EOG092654QZ', 'EOG09261DW8', 'EOG09260XSR', 'EOG0926133I', 'EOG09265G9U', 'EOG09260FZZ', 'EOG092645L9', 'EOG09262Z2S', 'EOG09264ZQC', 'EOG09262OX9', 'EOG09263QUM', 'EOG092619NK', 'EOG09264CP8', 'EOG09260RRN', 'EOG09260GR8', 'EOG09262VOC', 'EOG09264FVQ', 'EOG09260DUW', 'EOG09264BOA', 'EOG09262E4T', 'EOG09265BE5', 'EOG09261OSU', 'EOG0926049S', 'EOG09263UN3', 'EOG092603D0', 'EOG09264WF4', 'EOG092631IR', 'EOG09262BVA', 'EOG092631QQ', 'EOG09263I7I', 'EOG09264C3V', 'EOG09264ZDJ', 'EOG09260NZ8', 'EOG092653LT', 'EOG09260K24', 'EOG09262WSH', 'EOG09264C3N', 'EOG09263EVJ', 'EOG09260PI1', 'EOG0926131E', 'EOG092653YS', 'EOG092656JA', 'EOG09262VVF', 'EOG09260A98', 'EOG09261WVT', 'EOG09263720', 'EOG09260CKC', 'EOG092629WJ', 'EOG09261127', 'EOG092602MO', 'EOG09261UOJ', 'EOG09264W46', 'EOG09264NJ1', 'EOG092610QT', 'EOG0926273Q', 'EOG09265FCK', 'EOG09265ER6', 'EOG09260GKG', 'EOG092608T8', 'EOG09261FM4', 'EOG09262A6N', 'EOG09263CGP', 'EOG092621CP', 'EOG09260779', 'EOG09265B95', 'EOG09261NLR', 'EOG09265RGI', 'EOG09261ZXZ', 'EOG09261K9J', 'EOG09263MR4', 'EOG09263E87', 'EOG09260WG2', 'EOG09262G8Y', 'EOG09260FMW', 'EOG09263G3M', 'EOG09260PJ9', 'EOG092657YR', 'EOG092646WF', 'EOG09264DY4', 'EOG09263HED', 'EOG09260931', 'EOG092636T6', 'EOG09260U6R', 'EOG09263MGE', 'EOG092619L1', 'EOG09260N2T', 'EOG09264IG5', 'EOG09264O9F', 'EOG09265H9T', 'EOG09260TDY', 'EOG09262X74', 'EOG0926213Z', 'EOG09261I8J', 'EOG09263GSR', 'EOG09262KJA', 'EOG09264O6J', 'EOG09262P1W', 'EOG09261N20', 'EOG09262QS5', 'EOG092603EH', 'EOG09260Y2Q', 'EOG09260XL0', 'EOG09263KZJ', 'EOG09264A2D', 'EOG09263K05', 'EOG092604S1', 'EOG09265M98', 'EOG09265PWR', 'EOG09260EOI', 'EOG09265LEG', 'EOG09264272', 'EOG09261G1Y', 'EOG09261666', 'EOG09264T0J', 'EOG09261IEV', 'EOG092605QM', 'EOG09261Q18', 'EOG092606WZ', 'EOG09261JVS', 'EOG09265CCT', 'EOG09261EY9', 'EOG09262QJW', 'EOG09260UXC', 'EOG0926077L', 'EOG09265KQ4', 'EOG09264AWW', 'EOG092600T9', 'EOG0926539T', 'EOG09262K67', 'EOG09263CG4', 'EOG09263DQH', 'EOG09265D7J', 'EOG09262BCZ', 'EOG092614UB', 'EOG09261EM7', 'EOG09264OQ8', 'EOG0926423H', 'EOG09264NEF', 'EOG092620CR', 'EOG092630MJ', 'EOG092651HW', 'EOG09262SMG', 'EOG09264E6Z', 'EOG092641M3', 'EOG09262A8G', 'EOG092601KZ', 'EOG092642I5', 'EOG09265DDU', 'EOG09263JTO', 'EOG09263OZR', 'EOG09262XRU', 'EOG0926534P', 'EOG09260DFH', 'EOG09260VCG', 'EOG09260VTN', 'EOG092618J9', 'EOG09264RJL', 'EOG09262IY3', 'EOG09264S3E', 'EOG09264V2U', 'EOG09262NNS', 'EOG09260E8K', 'EOG09261MOX', 'EOG09260M87', 'EOG09261CM0', 'EOG09262BHE', 'EOG09261W90', 'EOG09260WUA', 'EOG09260S2Z', 'EOG09264TJN', 'EOG092646CB', 'EOG09263RDD', 'EOG09264RBX', 'EOG09263OCO', 'EOG09263F11', 'EOG092648VW', 'EOG09264VRO', 'EOG09261ACJ', 'EOG09261OLD', 'EOG09264ENO', 'EOG09262PPU', 'EOG092612LP', 'EOG09263Z41', 'EOG09263X22', 'EOG09261N64', 'EOG09261OIA', 'EOG09262VPD', 'EOG09262LQT', 'EOG092609YT', 'EOG092627F1', 'EOG09260FFP', 'EOG092602I6', 'EOG0926477X', 'EOG09261JWS', 'EOG09265GYD', 'EOG09264XUV', 'EOG092655L0', 'EOG092608L0', 'EOG0926419M', 'EOG09264PD5', 'EOG09260LI6', 'EOG09261IEH', 'EOG09261UWT', 'EOG09262CXO', 'EOG09262LI4', 'EOG09265KPR', 'EOG09264T3S', 'EOG092628FW', 'EOG092639H5', 'EOG09261FH7', 'EOG09262Q6S', 'EOG0926505R', 'EOG09264B3O', 'EOG092653SU', 'EOG09260S5R', 'EOG0926004Z', 'EOG092603KJ', 'EOG09263OWL', 'EOG09264DJ8', 'EOG09264T8I', 'EOG09264SQJ', 'EOG09262TEV', 'EOG092653NM', 'EOG09260VTA', 'EOG0926092K', 'EOG09262X01', 'EOG092649XV', 'EOG09262V3O', 'EOG092635YY', 'EOG09261YLQ', 'EOG09262FJB', 'EOG0926499W', 'EOG09260JJW', 'EOG092607OQ', 'EOG09260ERO', 'EOG09262V9N', 'EOG09260B3X', 'EOG09265I60', 'EOG09264VC6', 'EOG09261FAX', 'EOG09262XMN', 'EOG09264R0D', 'EOG092638CT', 'EOG09265EKJ', 'EOG09261SS1', 'EOG092602UY', 'EOG092616YZ', 'EOG09263RVR', 'EOG09261ICI', 'EOG09263UWJ', 'EOG09262N10', 'EOG09261ZJR', 'EOG09260KCB', 'EOG09263J3H', 'EOG09263FGN', 'EOG09264ZWF', 'EOG09264W7W', 'EOG09262UAS', 'EOG0926310O', 'EOG09261LPY', 'EOG0926071Q', 'EOG09262E3Q', 'EOG09262DPL', 'EOG09260AZK', 'EOG09264KDO', 'EOG092617RY', 'EOG09264HN5', 'EOG09260QVP', 'EOG09260375', 'EOG09264X31', 'EOG09264G04', 'EOG092609JB', 'EOG092606AJ', 'EOG092606AD', 'EOG09260NNR', 'EOG09261W1K', 'EOG092613UB', 'EOG09264I14', 'EOG09260H81', 'EOG09261XNU', 'EOG09260NWN', 'EOG09264IQ7', 'EOG09264CP4', 'EOG09265RGS', 'EOG09260GIX', 'EOG092658X5', 'EOG0926251E', 'EOG092621GA', 'EOG09263LR1', 'EOG09263E9V', 'EOG09262KZ3', 'EOG09262E7W', 'EOG092600NM', 'EOG09260RNZ', 'EOG09264IV9', 'EOG09262PMC', 'EOG09260R9L', 'EOG092631UM', 'EOG09260NC6', 'EOG09264P4J', 'EOG09260KGS', 'EOG09261B3Q', 'EOG09264ZXA', 'EOG09265B1X', 'EOG09262F22', 'EOG09263OAE', 'EOG09261V9P', 'EOG09261Q5L', 'EOG09264MGU', 'EOG09264KK7', 'EOG092611G7', 'EOG092644TW', 'EOG09261D4D', 'EOG09265PQX', 'EOG09263X1F', 'EOG09260KDB', 'EOG09263ZW6', 'EOG09263690', 'EOG09260NHB', 'EOG09261J0P', 'EOG09262JZK', 'EOG09263L9T', 'EOG092656CM', 'EOG09263GIG', 'EOG09262516', 'EOG09263IMF', 'EOG09265FTN', 'EOG09262WQX', 'EOG09265K60', 'EOG09264SSI', 'EOG09263OD3', 'EOG09265GGX', 'EOG09264D7Y', 'EOG09260T4S', 'EOG09260WUS', 'EOG09261KHB', 'EOG09261EMF', 'EOG092617AN', 'EOG09265PUI', 'EOG09263K45', 'EOG0926489S', 'EOG09262HP3', 'EOG09263ZBJ', 'EOG092619VG', 'EOG09261T98', 'EOG09260NXC', 'EOG09261ZFN', 'EOG09260MBW', 'EOG09265313', 'EOG09265NHW', 'EOG09260289', 'EOG092624KK', 'EOG09263S2P', 'EOG09264719', 'EOG092624UF', 'EOG09261DRB', 'EOG09261MMR', 'EOG09261VD2', 'EOG09265JA7', 'EOG092613R2', 'EOG09262914', 'EOG09260KNI', 'EOG0926369X', 'EOG09265A08', 'EOG09264A8D', 'EOG09260J8F', 'EOG09260KNR', 'EOG09262TUR', 'EOG09264NC7', 'EOG09264XKX', 'EOG092621ZV', 'EOG092600S9', 'EOG09263CAC', 'EOG09262JRP', 'EOG09260NHN', 'EOG09260B65', 'EOG092643NE', 'EOG09262KXK', 'EOG09265AT5', 'EOG0926431P', 'EOG092620E4', 'EOG092605OK', 'EOG09260JDM', 'EOG092652KR', 'EOG09260LS9', 'EOG09260UA2', 'EOG092641A6', 'EOG09261RWJ', 'EOG09264VZ7', 'EOG09260WU6', 'EOG092641UM', 'EOG0926354S', 'EOG09263M8W', 'EOG09263PWF', 'EOG092658NW', 'EOG092612MY', 'EOG092632WW', 'EOG0926390Q', 'EOG09263QPR', 'EOG09265HP0', 'EOG09263OQH', 'EOG092628LW', 'EOG09263U08', 'EOG092612CC', 'EOG09263AZP', 'EOG092620U5', 'EOG09265OQH', 'EOG09261I1G', 'EOG09260SAH', 'EOG092643QM', 'EOG09263KRO', 'EOG09263817', 'EOG09263MEM', 'EOG09265BJ3', 'EOG09264BWL', 'EOG09263WB5', 'EOG09263X0V', 'EOG09262QRH', 'EOG09264YEG', 'EOG09264PDD', 'EOG092651BA', 'EOG092608WU', 'EOG09264MN3', 'EOG092643JW', 'EOG09262N5O', 'EOG092646VF', 'EOG09261YRA', 'EOG092633QB', 'EOG09261IOS', 'EOG09260C2V', 'EOG09263U71', 'EOG09261225', 'EOG092617RN', 'EOG09261AH9', 'EOG09263SFX', 'EOG0926129I', 'EOG09264JHE', 'EOG09262A65', 'EOG092619EA', 'EOG092648O6', 'EOG092643VB', 'EOG09260DP1', 'EOG09264YKY', 'EOG09263EBB', 'EOG092641K1', 'EOG0926388H', 'EOG09260GF5', 'EOG09263HD8', 'EOG09262PZ9', 'EOG092634G9', 'EOG09265E8A', 'EOG092644O2', 'EOG09264PK5', 'EOG092652YI', 'EOG09264FYY', 'EOG09262387', 'EOG09265IT6', 'EOG09262X8R', 'EOG09264JK1', 'EOG09262M0W', 'EOG092645G0', 'EOG092648K5', 'EOG09264V30', 'EOG09262MFL', 'EOG092657H8', 'EOG09260SRF', 'EOG09265BTA', 'EOG09260TPT', 'EOG09264JW6', 'EOG09264K2W', 'EOG092629U5', 'EOG09262E98', 'EOG09265ANI', 'EOG0926142Y', 'EOG092645QN', 'EOG09262ILV', 'EOG09261B6Y', 'EOG09261ZPW', 'EOG09264SET', 'EOG092646C6', 'EOG09265JNA', 'EOG092636Y6', 'EOG092631ML', 'EOG09263W7L', 'EOG092627XA', 'EOG09261A3K', 'EOG092638EN', 'EOG09264RW6', 'EOG0926009O', 'EOG092651FJ', 'EOG09260HMA', 'EOG092648XW', 'EOG092646EZ', 'EOG092655SO', 'EOG09261YV6', 'EOG092634B1', 'EOG09264IIZ', 'EOG09260075', 'EOG092653KS', 'EOG09263MNN', 'EOG09263Y3L', 'EOG09260S3L', 'EOG09262PAY', 'EOG09261VUC', 'EOG09262M7B', 'EOG09262H1X', 'EOG09262YAU', 'EOG09262DC1', 'EOG09264HX6', 'EOG09261G92', 'EOG09263GUT', 'EOG0926506U', 'EOG09261RFF', 'EOG092624SJ', 'EOG092621F2', 'EOG09260N53', 'EOG09263J6Z', 'EOG092656IY', 'EOG09261PJZ', 'EOG09265JVH', 'EOG09260DBG', 'EOG09262341', 'EOG09263E5F', 'EOG09261V03', 'EOG09264XVU', 'EOG09263KB4', 'EOG09261HQU', 'EOG09262JWJ', 'EOG0926074Y', 'EOG092635ST', 'EOG0926115V', 'EOG09265EOF', 'EOG092635SS', 'EOG092614DJ', 'EOG09264DOU', 'EOG092600T4', 'EOG092626EQ', 'EOG09264LKR', 'EOG09262PIH', 'EOG09260ETR', 'EOG09264KIV', 'EOG09260OE9', 'EOG09261WJ8', 'EOG09265C25', 'EOG09264YIJ', 'EOG09263WZ2', 'EOG09262M2J', 'EOG09263RF8', 'EOG092603YJ', 'EOG092610LQ', 'EOG09261FAB', 'EOG09260KWP', 'EOG09262IP2', 'EOG0926137U', 'EOG09264F1U', 'EOG092648Q0', 'EOG09261G4Z', 'EOG09261B18', 'EOG09261N2L', 'EOG09260HS3', 'EOG092649VA', 'EOG0926591L', 'EOG09264XJC', 'EOG09264B2P', 'EOG09264DMU', 'EOG09264NNY', 'EOG092653O3', 'EOG092608RH', 'EOG092640BS', 'EOG09263JW5', 'EOG09262CDO', 'EOG092655M5', 'EOG09260EPQ', 'EOG09264CND', 'EOG09261F73', 'EOG09264LJU', 'EOG09263A5D', 'EOG09263DFA', 'EOG0926312D', 'EOG09260FKU', 'EOG09263FTE', 'EOG0926347W', 'EOG09264441', 'EOG09262CUO', 'EOG09261MPU', 'EOG0926587S'] were not found in the ancestral_variants file
INFO    Running tblastn, writing output to /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco/run_saccharomyces/blast_output/tblastn_saccharomyces_missing_and_frag_rerun.tsv...
INFO    [tblastn]   Warning: [tblastn] Query is Empty!
INFO    Getting coordinates for candidate regions...
INFO    ****** Step 2/3, current time: 06/30/2022 09:13:58 ******
INFO    Training Augustus using Single-Copy Complete BUSCOs:
INFO    06/30/2022 09:13:58 =>  Converting predicted genes to short genbank files...
INFO    06/30/2022 09:14:04 =>  All files converted to short genbank files, now running the training scripts...
INFO    Pre-Augustus scaffold extraction...
INFO    Re-running Augustus with the new metaparameters, number of target BUSCOs: 932
INFO    06/30/2022 09:14:21 =>  0% of predictions performed (0 to be done)
INFO    06/30/2022 09:14:21 =>  100% of predictions performed
INFO    Extracting predicted proteins...
INFO    ****** Step 3/3, current time: 06/30/2022 09:14:21 ******
INFO    Running HMMER to confirm orthology of predicted proteins:
INFO    06/30/2022 09:14:21 =>  0% of predictions performed (0 to be done)
INFO    06/30/2022 09:14:21 =>  100% of predictions performed
INFO    Results:
INFO    C:28.9%[S:28.4%,D:0.5%],F:0.8%,M:70.3%,n:1312
INFO    380 Complete BUSCOs (C)
INFO    373 Complete and single-copy BUSCOs (S)
INFO    7 Complete and duplicated BUSCOs (D)
INFO    10 Fragmented BUSCOs (F)
INFO    922 Missing BUSCOs (M)
INFO    1312 Total BUSCO groups searched

INFO    BUSCO analysis done with WARNING(s). Total running time: 711.759425163269 seconds
INFO    Results written in /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco/run_saccharomyces/

INFO    ****************** Start a BUSCO 2.0 analysis, current time: 06/30/2022 09:14:41 ******************
INFO    The lineage dataset is: dikarya_odb9 (eukaryota)
INFO    Mode is: proteins
INFO    To reproduce this run: python /venv/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco_augustus.proteins.fasta -o saccharomyces -l /opt/databases/dikarya/ -m proteins -c 4 -sp anidulans
INFO    Check dependencies...
INFO    Check input file...
INFO    Temp directory is ./tmp/
INFO    Running HMMER on the proteins:
INFO    06/30/2022 09:14:42 =>  0% of predictions performed (1312 to be done)
INFO    06/30/2022 09:14:44 =>  10% of predictions performed (133/1312 candidate proteins)
INFO    06/30/2022 09:14:47 =>  20% of predictions performed (264/1312 candidate proteins)
INFO    06/30/2022 09:14:49 =>  30% of predictions performed (396/1312 candidate proteins)
INFO    06/30/2022 09:14:53 =>  40% of predictions performed (525/1312 candidate proteins)
INFO    06/30/2022 09:14:57 =>  50% of predictions performed (658/1312 candidate proteins)
INFO    06/30/2022 09:15:01 =>  60% of predictions performed (788/1312 candidate proteins)
INFO    06/30/2022 09:15:06 =>  70% of predictions performed (921/1312 candidate proteins)
INFO    06/30/2022 09:15:11 =>  80% of predictions performed (1050/1312 candidate proteins)
INFO    06/30/2022 09:15:17 =>  90% of predictions performed (1182/1312 candidate proteins)
INFO    06/30/2022 09:15:22 =>  100% of predictions performed
INFO    Results:
INFO    C:28.3%[S:28.2%,D:0.1%],F:0.0%,M:71.7%,n:1312
INFO    371 Complete BUSCOs (C)
INFO    370 Complete and single-copy BUSCOs (S)
INFO    1 Complete and duplicated BUSCOs (D)
INFO    0 Fragmented BUSCOs (F)
INFO    941 Missing BUSCOs (M)
INFO    1312 Total BUSCO groups searched

INFO    BUSCO analysis done. Total running time: 42.64242601394653 seconds
INFO    Results written in /Users/jon/test-predict_6004e1e9-1eda-4469-a5b0-c06c104a1135/annotate/predict_misc/busco_proteins/run_saccharomyces/
nextgenusfs commented 2 years ago

If it helps, I just pushed an update where the dependencies, versions, and full paths are printed to the logfile, ie:

[06/30/22 10:18:12]: /venv/bin/funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 4 --species Awesome testicus

[06/30/22 10:18:12]: OS: Debian GNU/Linux 10, 4 cores, ~ 8 GB RAM. Python: 3.8.13
[06/30/22 10:18:12]: Running funannotate v1.8.12
[06/30/22 10:18:12]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[06/30/22 10:18:12]: exonerate version=exonerate 2.4.0 path=/venv/bin/exonerate
[06/30/22 10:18:12]: diamond version=2.0.15 path=/venv/bin/diamond
[06/30/22 10:18:12]: tbl2asn version=no way to determine, likely 25.X path=/venv/bin/tbl2asn
[06/30/22 10:18:12]: bedtools version=bedtools v2.30.0 path=/venv/bin/bedtools
[06/30/22 10:18:12]: augustus version=3.3.2 path=/usr/bin/augustus
[06/30/22 10:18:12]: etraining version=NA path=/usr/bin/etraining
[06/30/22 10:18:12]: tRNAscan-SE version=2.0.9 (July 2021) path=/venv/bin/tRNAscan-SE
[06/30/22 10:18:12]: bam2hints version=NA path=/usr/bin/bam2hints
[06/30/22 10:18:12]: minimap2 version=2.24-r1122 path=/venv/bin/minimap2
[06/30/22 10:18:12]: $AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
[06/30/22 10:18:13]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[06/30/22 10:18:13]: Skipping CodingQuarry as no --rna_bam passed

You can upgrade with pip, ie:

python -m pip install git+https://github.com/nextgenusfs/funannotate.git

Ultimately I think this is a problem with augustus, not entirely sure what is the issue. It is outputting data like its Augustus 3.4 which is incompatible with the internal BUSCO script in funannotate, so I've not seen any version < 3.4 fail like this before.

IanDMedeiros commented 2 years ago

I tried upgrading as you suggested and got a new error:

######################################################### Running funannotate clean unit testing: minimap2 mediated assembly duplications Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076 8192 [3.25%]16384 [6.50%]24576 [9.75%]32768 [13.00%]40960 [16.25%]49152 [19.50%]57344 [22.75%]65536 [26.00%]73728 [29.25%]81920 [32.50%]90112 [35.75%]98304 [39.00%]106496 [42.25%]114688 [45.50%]122880 [48.75%]131072 [52.00%]139264 [55.25%]147456 [58.50%]155648 [61.75%]163840 [65.00%]172032 [68.25%]180224 [71.50%]188416 [74.75%]196608 [78.00%]204800 [81.25%]212992 [84.50%]221184 [87.74%]229376 [90.99%]237568 [94.24%]245760 [97.49%]252076 [100.00%]Traceback (most recent call last): File "/hpc/home/idm7/miniconda3/envs/funannotate/bin/funannotate", line 8, in sys.exit(main()) File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/clean.py", line 194, in main CheckDependencies(programs) File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 910, in CheckDependencies log.debug('{} version={} path={}'.format(f[0], f[2], f[1])) NameError: name 'log' is not defined CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive ######################################################### ######################################################### Traceback (most recent call last): File "/hpc/home/idm7/miniconda3/envs/funannotate/bin/funannotate", line 8, in sys.exit(main()) File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 401, in main runCleanTest(args) File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 130, in runCleanTest assert countfasta(os.path.join(tmpdir, 'test.exhaustive.fa')) == 3 File "/hpc/home/idm7/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 36, in countfasta with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-clean_5ed48339-b445-4f05-92d7-20b9e40d7c14/test.exhaustive.fa'

nextgenusfs commented 2 years ago

Oh, that's my fault, I'll fix.

Can you just run funannotate test -t predict --cpus N so it just runs the predict test.

nextgenusfs commented 2 years ago

latest commit should fix this error, thanks for reporting.

IanDMedeiros commented 2 years ago

funannotate test -t predict --cpus 10

The test completes, but still only finds 175 BUSCO loci

######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 [Jul 02 02:46 PM]: OS: CentOS Stream 8, 46 cores, ~ 230 GB RAM. Python: 3.8.12 [Jul 02 02:46 PM]: Running funannotate v1.8.12 [Jul 02 02:46 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Jul 02 02:46 PM]: Skipping CodingQuarry as no --rna_bam passed [Jul 02 02:46 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
glimmerhmm busco
snap busco
[Jul 02 02:46 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Jul 02 02:46 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Jul 02 02:46 PM]: Mapping 1,065 proteins to genome using diamond and exonerate [Jul 02 02:46 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 [Jul 02 02:46 PM]: Exonerate finished in 0:00:21: found 1,270 alignments [Jul 02 02:46 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Jul 02 02:51 PM]: 175 valid BUSCO predictions found, validating protein sequences [Jul 02 02:52 PM]: 175 BUSCO predictions validated [Jul 02 02:52 PM]: Running Augustus gene prediction using saccharomyces parameters [Jul 02 02:53 PM]: 1,485 predictions from Augustus [Jul 02 02:53 PM]: Pulling out high quality Augustus predictions [Jul 02 02:53 PM]: Found 371 high quality predictions from Augustus (>90% exon evidence) [Jul 02 02:53 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Jul 02 02:54 PM]: 1,519 predictions from SNAP [Jul 02 02:54 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Jul 02 02:55 PM]: 1,586 predictions from GlimmerHMM [Jul 02 02:55 PM]: Summary of gene models passed to EVM (weights): [Jul 02 02:55 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Jul 02 02:57 PM]: Converting to GFF3 and collecting all EVM results Source Weight Count Augustus 1 1325 Augustus HiQ 2 372
GlimmerHMM 1 1586 snap 1 1519 Total - 4802 [Jul 02 02:57 PM]: 1,683 total gene models from EVM [Jul 02 02:57 PM]: Generating protein fasta files from 1,683 EVM models [Jul 02 02:57 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Jul 02 02:57 PM]: Found 131 gene models to remove: 0 too short; 0 span gaps; 131 transposable elements [Jul 02 02:57 PM]: 1,552 gene models remaining [Jul 02 02:57 PM]: Predicting tRNAs [Jul 02 02:57 PM]: 112 tRNAscan models are valid (non-overlapping) [Jul 02 02:57 PM]: Generating GenBank tbl annotation file [Jul 02 02:57 PM]: Collecting final annotation files for 1,664 total gene models [Jul 02 02:57 PM]: Converting to final Genbank format [Jul 02 02:57 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [Jul 02 02:57 PM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (manual install): funannotate iprscan -i annotate -c 10

Run antiSMASH (optional): funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i annotate --cpus 10 --sbt yourSBTfile.txt

[Jul 02 02:57 PM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json [Jul 02 02:57 PM]: Add species parameters to database:

funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 10 --species Awesome testicus ######################################################### ######################################################### SUCCESS: funannotate predict test complete. #########################################################

The error you commented on earlier,

INFO [hmmersearch] Parse failed (sequence file /hpc/group/bio1/ian/envs/annotate/test-busco_b548c447-5a92-4bca-b6c4-d116e6f7177e/annotate/predict_misc/busco/run_awesome_busco/augustus_output/extracted_proteins/EOG092603EH.faa.1): INFO [hmmersearch] Line 2: illegal character %

seems to be happening becuase Augustus is appending extra stuff to some of the protein sequences. See for example

">g1[CP022974.1:280548-281798] MVSSLPKESQAELQLFQNEINAANPSDFLQFSANYFNKRLEQQRAFLKAREPEFKAKNIVLFPEPEESFSRPQSAQSQSRSRSSVMFKSPFVNEDPHSNVFKSGFNLDPHEQDTHQQAQEEQQHTREKTSTPPLPMHFNAQRRTSVSGETLQPNNFDDWTPDHYKEKSEQQLQRLEKSIRNNFLFNKLDSDSKRLVINCLEEKSVPKGATIIKQGDQGDYFYVVEKGTVDFYVNDNKVNSSGPGSSFGELALMYNSPRAATVVATSDCLLWALDRLTFRKILLGSSFKKRLMYDDLLKSMPVLKSLTTYDRAKLADALDTKIYQPGETIIREGDQGENFYLIEYGAVDVSKKGQGVINKLKDHDYFGEVALLNDLPRQATVTATKRTKVATLGKSGFQRLLGPAVDVLKLNDPTRHEvidence%CDSCDS5'UTR3'UTRhintincompatibleRM:"

nextgenusfs commented 2 years ago

And in the logfile it lists Augustus v3.3.3?

So try a different version of Augustus. It needs to be less than v3.4. I thought 3.3.3 was fine but apparently not on your system. Most if not all of the versions on bioconda lately are not compiled properly and won't work with BUSCO. I don't know what changed in bioconda that it stopped working.

IanDMedeiros commented 2 years ago

I am working to set up an alternative version of Augustus now.

The error I am seeing with my existing Augustus 3.3.3 compiled without bioconda seems to be in how the /predicted_genes files (eg., EOG09260A98.out.1) are being converted to /extracted_proteins files (e.g., EOG09260A98.faa.1). Do you happen to know if this is performed by Augustus or one of its associated scripts? If so, I will post an issue in the Augustus github repository.

EDIT: Ok, looks like it is BUSCO that extracts the sequences.

nextgenusfs commented 2 years ago

Yeah BUSCO extracts but Augustus format changed in 3.4. I haven't had chance to update that script. I want to remove the internal BUSCO and use the conda most recent package, however there are too many dependency conflicts and I cannot get a conda environment to build properly. So I haven't changed it yet. The Augustus proteinprofile issue though will continue to be an issue.

IanDMedeiros commented 2 years ago

I still have no idea what is the problem with my Augustus 3.3.3, but I have solved the immediate issue by modifying funannotate-BUSCO2.py. I inserted two lines into the function _extract to account for the format of the predicted_genes .out files:

1606   elif line.startswith('# Evidence'):
1607        check = 0

Attaching the full edited function as well: _extract.py.txt

Thanks for your help thinking through this!

nextgenusfs commented 2 years ago

Great thanks. I will try to find some time to test if this works with other versions of Augustus.

nextgenusfs commented 2 years ago

Thanks @IanDMedeiros -- I'll incorporate your change above as I think will work and allow support of augustus v3.4 in the current codebase. I actually ended up re-writing busco as a simplified version for what funannotate uses as we need a module that can be installed/solved with conda -- the BUSCOv5 won't work and has many new dependencies. It will be easier to maintain as a repo outside of funannotate. There are a few things left to do, but repo is here: https://github.com/nextgenusfs/buscolite. If you have a chance to test with your augustus 3.3.3 version that would be helpful, you should be able to simply install into funannotate environment with pip.

hyphaltip commented 2 years ago

FYI - augustus=3.5.0 now available now in bioconda https://github.com/bioconda/bioconda-recipes/pull/37364 can we test this is working too?

akshayoo commented 3 weeks ago

funannotate predict -i masked_genome.fa -o Predict_out -s "Fusarium boothii" --cpus 24 --augustus_species fusarium --busco_seed_species fusarium

[Oct 10 05:55 PM]: OS: Ubuntu 20.04, 24 cores, ~ 231 GB RAM. Python: 3.8.19 [Oct 10 05:55 PM]: Running funannotate v1.8.17 [Oct 10 05:55 PM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction. [Oct 10 05:55 PM]: Skipping CodingQuarry as no --rna_bam passed [Oct 10 05:55 PM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
glimmerhmm busco
snap busco
[Oct 10 05:56 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [Oct 10 05:56 PM]: Genome loaded: 32 scaffolds; 36,363,344 bp; 7.67% repeats masked /home/user/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-p2g.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html from pkg_resources import parse_version [Oct 10 05:56 PM]: Mapping 558,971 proteins to genome using diamond and exonerate [Oct 10 05:58 PM]: Found 293,238 preliminary alignments with diamond in 0:01:50 --> generated FASTA files for exonerate in 0:00:28 Progress: 293238 complete, 0 failed, 0 remaining
[Oct 10 06:17 PM]: Exonerate finished in 0:18:49: found 1,877 alignments [Oct 10 06:17 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Oct 10 06:17 PM]: 0 valid BUSCO predictions found, validating protein sequences Traceback (most recent call last): File "/home/user/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/home/user/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 717, in main mod.main(arguments) File "/home/user/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 2007, in main buscoProtComplete = lib.getCompleteBuscos( File "/home/user/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 6946, in getCompleteBuscos with open(input, "r") as infile: FileNotFoundError: [Errno 2] No such file or directory: 'Predict_out/predict_misc/busco_proteins/run_fusarium/full_table_fusarium.tsv'

Busco is failing to run in my case while running the predict command.