nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_eccf5e84-d7e2-4805-bf8c-64f475650a4d/annotate/predict_results/Awesome_testicus.gff3' #872

Open ZeweiSong opened 1 year ago

ZeweiSong commented 1 year ago

Are you using the latest release? I'm using the latest version installed from conda

Describe the bug I ran the command

funannotate test -t predict

and it stops with the error below:

minimap2 version=2.24-r1122 path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/minimap2
[Mar 01 12:08 AM]: OS: CentOS Linux 8, 64 cores, ~ 528 GB RAM. Python: 3.8.15
[Mar 01 12:08 AM]: Running funanotate v1.8.13
[Mar 01 12:08 AM]: Soft-masking simple repeats with tantan
[Mar 01 12:08 AM]: Repeat soft-masking finished: 
Masked genome: /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-mask_eccf5e84-d7e2-4805-bf8c-64f475650a4d/test.masked.fa
num scaffolds: 2
assembly size: 1,216,048 bp
masked repeats: 50,965 bp (4.19%)
-------------------------------------------------------
[Mar 01 12:08 AM]: OS: CentOS Linux 8, 64 cores, ~ 528 GB RAM. Python: 3.8.15
[Mar 01 12:08 AM]: Running funannotate v1.8.13
[Mar 01 12:08 AM]: Skipping CodingQuarry as no --rna_bam passed
[Mar 01 12:08 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained     
  glimmerhmm   busco          
  snap         busco          
[Mar 01 12:08 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Mar 01 12:08 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Mar 01 12:08 AM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Mar 01 12:08 AM]: Found 1,505 preliminary alignments with diamond in 0:00:03 --> generated FASTA files for exonerate in 0:00:00
[Mar 01 12:09 AM]: Exonerate finished in 0:00:29: found 1,270 alignments
[Mar 01 12:09 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Mar 01 12:20 AM]: 373 valid BUSCO predictions found, validating protein sequences
[Mar 01 12:21 AM]: 370 BUSCO predictions validated
[Mar 01 12:21 AM]: Running Augustus gene prediction using saccharomyces parameters
[Mar 01 12:23 AM]: 1,485 predictions from Augustus
[Mar 01 12:23 AM]: Pulling out high quality Augustus predictions
[Mar 01 12:23 AM]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[Mar 01 12:23 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Mar 01 12:23 AM]: 1,490 predictions from SNAP
[Mar 01 12:23 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Mar 01 12:24 AM]: 1,771 predictions from GlimmerHMM
[Mar 01 12:24 AM]: Summary of gene models passed to EVM (weights):
[Mar 01 12:24 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Mar 01 12:34 AM]: Converting to GFF3 and collecting all EVM results
  Source         Weight   Count
  Augustus       1        1325 
  Augustus HiQ   2        372  
  GlimmerHMM     1        1771 
  snap           1        1490 
  Total          -        4958 
[Mar 01 12:34 AM]: 1,687 total gene models from EVM
[Mar 01 12:34 AM]: Generating protein fasta files from 1,687 EVM models
[Mar 01 12:34 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
Traceback (most recent call last):
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/predict.py", line 1822, in main
    lib.RepeatBlast(EVM_proteins, args.cpus, 1e-10, FUNDB,
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 5473, in RepeatBlast
    for qresult in SearchIO.parse(results, "blast-xml"):
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/Bio/SearchIO/__init__.py", line 306, in parse
    yield from generator
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_xml.py", line 240, in __iter__
    yield from self._parse_qresult()
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_xml.py", line 289, in _parse_qresult
    for event, qresult_elem in self.xml_iter:
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/xml/etree/ElementTree.py", line 1227, in iterator
    yield from pullparser.read_events()
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/xml/etree/ElementTree.py", line 1302, in read_events
    raise event
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/xml/etree/ElementTree.py", line 1274, in feed
    self._parser.feed(data)
xml.etree.ElementTree.ParseError: mismatched tag: line 27, column 4
Traceback (most recent call last):
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 405, in main
    runPredictTest(args)
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 160, in runPredictTest
    assert 1500 <= countGFFgenes(os.path.join(
  File "/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes
    with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_eccf5e84-d7e2-4805-bf8c-64f475650a4d/annotate/predict_results/Awesome_testicus.gff3'
ZeweiSong commented 1 year ago

And here is the funannotate log I have:

[02/28/23 23:51:26]: /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 12 --species Awesome testicus

[02/28/23 23:51:26]: OS: CentOS Linux 8, 64 cores, ~ 528 GB RAM. Python: 3.8.15
[02/28/23 23:51:26]: Running funannotate v1.8.13
[02/28/23 23:51:26]: GeneMark path: /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/gmes_petap.pl
[02/28/23 23:51:26]: Full path to gmes_petap.pl: /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/gmes_petap.pl/gmes_petap.pl
[02/28/23 23:51:26]: GeneMark appears to be functional? False
[02/28/23 23:51:27]: exonerate version=exonerate 2.4.0 path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/exonerate
[02/28/23 23:51:27]: diamond version=2.1.3 path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/diamond
[02/28/23 23:51:27]: tbl2asn version=no way to determine, likely 25.X path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/tbl2asn
[02/28/23 23:51:27]: bedtools version=bedtools v2.30.0 path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/bedtools
[02/28/23 23:51:27]: augustus version=3.3.3 path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/augustus
[02/28/23 23:51:27]: etraining version=NA path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/etraining
[02/28/23 23:51:27]: tRNAscan-SE version=2.0.11 (Oct 2022) path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/tRNAscan-SE
[02/28/23 23:51:27]: bam2hints version=NA path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/bam2hints
[02/28/23 23:51:27]: minimap2 version=2.24-r1122 path=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/minimap2
[02/28/23 23:51:27]: $AUGUSTUS_CONFIG_PATH=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/config/
[02/28/23 23:51:28]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[02/28/23 23:51:28]: Skipping CodingQuarry as no --rna_bam passed
[02/28/23 23:51:28]: {'augustus': 'pretrained', 'snap': 'busco', 'glimmerhmm': 'busco'}
[02/28/23 23:51:28]: Parsed training data, run ab-initio gene predictors as follows:
[02/28/23 23:51:29]: {'augustus': 1, 'hiq': 2, 'genemark': 0, 'pasa': 6, 'codingquarry': 0, 'snap': 1, 'glimmerhmm': 1, 'proteins': 1, 'transcripts': 1}
[02/28/23 23:51:30]: Loading genome assembly and parsing soft-masked repetitive sequences
[02/28/23 23:51:30]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[02/28/23 23:51:42]: join_mult_hints.pl
[02/28/23 23:51:42]: Running BUSCO to find conserved gene models for training ab-initio predictors
[02/28/23 23:51:42]: /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/python /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-BUSCO2.py -i /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/genome.softmasked.fa -m genome --lineage /dssg/home/acct-trench/trench/USER/bins/funannotate_db/dikarya -o saccharomyces -c 12 --species anidulans -f --local_augustus /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/ab_initio_parameters/augustus
[02/28/23 23:53:58]: 373 valid BUSCO predictions found, validating protein sequences
[02/28/23 23:54:25]: 370 BUSCO predictions validated
[02/28/23 23:54:25]: Running Augustus gene prediction using saccharomyces parameters
[02/28/23 23:55:21]: perl /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/opt/evidencemodeler-1.1.1/EvmUtils/misc/augustus_GFF3_to_EVM_GFF3.pl annotate/predict_misc/augustus.gff3
[02/28/23 23:55:21]: Pulling out high quality Augustus predictions
[02/28/23 23:55:21]: Found 371 high quality predictions from Augustus (>90% exon evidence)
[02/28/23 23:55:21]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[02/28/23 23:55:21]: 370 gene models to train snap on 6 scaffolds
[02/28/23 23:55:21]: fathom /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/snap.training.zff /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/snap-training.scaffolds.fasta -categorize 1000 -min-intron 10 -max-intron 3000
[02/28/23 23:55:21]: fathom uni.ann uni.dna -export 1000 -plus
[02/28/23 23:55:21]: forge export.ann export.dna
[02/28/23 23:55:22]: perl /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/hmm-assembler.pl snap-trained annotate/predict_misc/snaptrain
[02/28/23 23:55:22]: snap /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/snap-trained.hmm /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/genome.softmasked.fa
[02/28/23 23:55:36]: scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done
scoring....decoding.10.20.30.40.50.60.70.80.90.100 done

[02/28/23 23:55:37]: 1,501 predictions from SNAP
[02/28/23 23:55:37]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[02/28/23 23:55:37]: trainGlimmerHMM /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/genome.softmasked.fa /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/glimmer.exons -d annotate/predict_misc/glimmerhmm
[02/28/23 23:56:10]: perl /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/glimmhmm.pl /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/glimmerhmm /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/genome.softmasked.fa /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/glimmerhmm -g
[02/28/23 23:56:21]: Process CP022970.1
Done 577664 bp
Done CP022970.1
Process CP022971.1
Done 274493 bp
Done CP022971.1
Process CP022972.1
Done 950314 bp
Done 1132299 bp
Done CP022972.1
Process CP022973.1
Done 576084 bp
Done CP022973.1
Process CP022974.1
Done 449806 bp
Done CP022974.1
Process CP022975.1
Done 766242 bp
Done CP022975.1

[02/28/23 23:56:21]: 1,770 predictions from GlimmerHMM
[02/28/23 23:56:21]: Prediction sources: ['HiQ', 'Augustus', 'GlimmerHMM', 'snap']
[02/28/23 23:56:21]: Summary of gene models: {'total': 4968, 'HiQ': 372, 'Augustus': 1325, 'GlimmerHMM': 1770, 'snap': 1501}
[02/28/23 23:56:21]: EVM Weights: {'HiQ': 2, 'Augustus': 1, 'GlimmerHMM': 1, 'snap': 1, 'proteins': 1}
[02/28/23 23:56:21]: Summary of gene models passed to EVM (weights):
[02/28/23 23:56:21]: Launching EVM via funannotate-runEVM.py
[02/28/23 23:56:21]: /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/python /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/lib/python3.8/site-packages/funannotate/aux_scripts/funannotate-runEVM.py -w /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/weights.evm.txt -c 12 -g /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/gene_predictions.gff3 -d /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/EVM -f /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/genome.softmasked.fa -l annotate/logfiles/funannotate-EVM.log -m 10 -i 1500 -o /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/evm.round1.gff3 --EVM_HOME /dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/opt/evidencemodeler-1.1.1 -p /dssg/home/acct-trench/trench/USER/songzewei/data_process/tst_tree/tst_funa/test-predict_d6f39d32-86f2-4de4-b5f0-8b95f81f320e/annotate/predict_misc/protein_alignments.gff3
[02/28/23 23:58:05]: 1,684 total gene models from EVM
[02/28/23 23:58:05]: Generating protein fasta files from 1,684 EVM models
[02/28/23 23:58:06]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[02/28/23 23:58:06]: diamond blastp --sensitive --query annotate/predict_misc/evm.round1.proteins.fa --threads 12 --out annotate/predict_misc/repeats.xml --db /dssg/home/acct-trench/trench/USER/bins/funannotate_db/repeats.dmnd --evalue 1e-10 --max-target-seqs 1 --outfmt 5
ZeweiSong commented 1 year ago

And the funannotate check --show-versions:

$funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.13
-------------------------------------------------------
You are running Python v 3.8.15. Now checking python packages...
biopython: 1.81
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.2.0
numpy: 1.24.2
pandas: 1.5.3
psutil: 5.9.4
requests: 2.28.2
scikit-learn: 1.2.1
scipy: 1.10.0
seaborn: 0.12.2
All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.38
Clone: 0.46
DBD::SQLite: 1.72
DBD::mysql: 4.046
DBI: 1.643
DB_File: 1.855
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.54
Hash::Merge: 0.302
JSON: 4.10
LWP::UserAgent: 6.67
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.12
YAML: 1.30
local::lib: 2.000029
threads: 2.25
threads::shared: 1.61
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/dssg/home/acct-trench/trench/USER/bins/funannotate_db/
$PASAHOME=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/opt/pasa-2.5.2
$TRINITY_HOME=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/config/
$GENEMARK_PATH=/dssg/home/acct-trench/trench/Soft/Conda/envs/funannotate/bin/gmes_petap.pl
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.5.2
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.1.3
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2021-08-25
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 17.0.3-internal
kallisto: 0.46.1
mafft: v7.515 (2023/Jan/15)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: pigz 2.6
proteinortho: 6.1.7
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.16.1
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.11 (Oct 2022)
tantan: tantan 40
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
        ERROR: emapper.py not installed
        ERROR: gmes_petap.pl not installed
        ERROR: signalp not installed
nextgenusfs commented 1 year ago

Hi @ZeweiSong, hope all is well. I think this is dying because the diamond database step seems to have failed, ie it built a repeats.dmd in the $FUNANNTOATE_DB when you ran funannotate setup. So I'd try to run this command manually and see if there is an error:

diamond blastp --sensitive --query annotate/predict_misc/evm.round1.proteins.fa --threads 12 --out annotate/predict_misc/repeats.xml --db /dssg/home/acct-trench/trench/USER/bins/funannotate_db/repeats.dmnd --evalue 1e-10 --max-target-seqs 1 --outfmt 5
ZeweiSong commented 1 year ago

Thanks for the quick reply! It seems the command ran through:

$diamond blastp --sensitive --query annotate/predict_misc/evm.round1.proteins.fa --threads 12 --out annotate/predict_misc/repeats.xml --db /dssg/home/acct-trench/trench/USER/bins/funannotate_db/repeats.dmnd --evalue 1e-10 --max-target-seqs 1 --outfmt 5
diamond v2.1.3.157 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 12
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: annotate/predict_misc
#Target sequences to report alignments for: 1
Opening the database...  [0.066s]
Database: /dssg/home/acct-trench/trench/USER/bins/funannotate_db/repeats.dmnd (type: Diamond database, sequences: 11950, letters: 9920808)
Block size = 2000000000
Algorithm: Double-indexed
Building query histograms...  [0.036s]
Loading reference sequences...  [0.012s]
Masking reference...  [0.121s]
Initializing temporary storage...  [0.006s]
Building reference histograms...  [0.344s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/16, index chunk 1/4.
Building reference seed array...  [0.035s]
Building query seed array...  [0.003s]
Computing hash join...  [0.012s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.013s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/16, index chunk 2/4.
Building reference seed array...  [0.051s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/16, index chunk 3/4.
Building reference seed array...  [0.046s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/16, index chunk 4/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/16, index chunk 1/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 3/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 3/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 3/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 3/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 4/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 4/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 4/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 4/16, index chunk 4/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 5/16, index chunk 1/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 5/16, index chunk 2/4.
Building reference seed array...  [0.04s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 5/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 5/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 6/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 6/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 6/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 6/16, index chunk 4/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 7/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 7/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 7/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 7/16, index chunk 4/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 8/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 8/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 8/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 8/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 9/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 9/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 9/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 9/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 10/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 10/16, index chunk 2/4.
Building reference seed array...  [0.04s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 10/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 10/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 11/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 11/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 11/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 11/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 12/16, index chunk 1/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 12/16, index chunk 2/4.
Building reference seed array...  [0.04s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 12/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 12/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 13/16, index chunk 1/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 13/16, index chunk 2/4.
Building reference seed array...  [0.04s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 13/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 13/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 14/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 14/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 14/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 14/16, index chunk 4/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 15/16, index chunk 1/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 15/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 15/16, index chunk 3/4.
Building reference seed array...  [0.045s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 15/16, index chunk 4/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 16/16, index chunk 1/4.
Building reference seed array...  [0.031s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.003s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 16/16, index chunk 2/4.
Building reference seed array...  [0.041s]
Building query seed array...  [0.004s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 16/16, index chunk 3/4.
Building reference seed array...  [0.056s]
Building query seed array...  [0.005s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0.001s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 16/16, index chunk 4/4.
Building reference seed array...  [0.032s]
Building query seed array...  [0.003s]
Computing hash join...  [0.008s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0.004s]
Deallocating memory...  [0s]
Deallocating buffers...  [0s]
Clearing query masking...  [0s]
Computing alignments... Loading trace points...  [0.003s]
Sorting trace points...  [0.002s]
Computing alignments...  [0.094s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [0.103s]
Deallocating reference...  [0s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Total time = 4.421s
Reported 64 pairwise alignments, 64 HSPs.
64 queries aligned.
nextgenusfs commented 1 year ago

Okay if that ran okay than it seems would be the biopython parser of that blast xml file. Possible change in diamond xml format could be the culprit.

ZeweiSong commented 1 year ago

Okay if that ran okay than it seems would be the biopython parser of that blast xml file. Possible change in diamond xml format could be the culprit.

So, any solution so far or should we wait for an update? :D

nextgenusfs commented 1 year ago

Can you send me that diamond xml file and I can try to parse locally, here is the python code that seems to be failing (it works locally and in the docker image).

  from Bio import SearchIO
  with open(output, 'w') as out:
      with open(blast_tmp, 'r') as results:
          for qresult in SearchIO.parse(results, "blast-xml"):
              hits = qresult.hits
              ID = qresult.id
              num_hits = len(hits)
              if num_hits > 0:
                  length = 0
                  for i in range(0, len(hits[0].hsps)):
                      length += hits[0].hsps[i].aln_span
                  pident = hits[0].hsps[0].ident_num / float(length)
                  out.write("%s\t%s\t%f\t%s\n" %
                            (ID, hits[0].id, pident, hits[0].hsps[0].evalue))
nextgenusfs commented 1 year ago

I'm running biopython: 1.76 locally and diamond: 2.0.8 and the docker image is running biopython: 1.80 and diamond: 2.0.15. So if you want a quick fix would probably be to downgrade diamond. If you do that you will likely need to re-install the databases as the ones that get built with diamond could change.

nextgenusfs commented 1 year ago

The attachment didn't work, you can send to my email if that's easier. or I think might need to zip it to attach on GitHub.

ZeweiSong commented 1 year ago

repeats.zip

How about this one?

nextgenusfs commented 1 year ago

Yes, looks like malformed XML file from diamond. So downgrade diamond to earlier version (I know 2.0.15 is safe), and then re-run funannotate setup and that hopefully should fix.

>>> with open('repeats.xml') as results:
...     for qresult in SearchIO.parse(results, 'blast-xml'):
...             hits = qresult.hits
...             ID = qresult.id
...             print(ID, hits[0].id)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/jon/miniconda3/envs/py3-funannotate/lib/python3.7/site-packages/Bio/SearchIO/__init__.py", line 320, in parse
    for qresult in generator:
  File "/Users/jon/miniconda3/envs/py3-funannotate/lib/python3.7/site-packages/Bio/SearchIO/BlastIO/blast_xml.py", line 258, in __iter__
    for qresult in self._parse_qresult():
  File "/Users/jon/miniconda3/envs/py3-funannotate/lib/python3.7/site-packages/Bio/SearchIO/BlastIO/blast_xml.py", line 308, in _parse_qresult
    for event, qresult_elem in self.xml_iter:
  File "/Users/jon/miniconda3/envs/py3-funannotate/lib/python3.7/xml/etree/ElementTree.py", line 1222, in iterator
    yield from pullparser.read_events()
  File "/Users/jon/miniconda3/envs/py3-funannotate/lib/python3.7/xml/etree/ElementTree.py", line 1297, in read_events
    raise event
  File "/Users/jon/miniconda3/envs/py3-funannotate/lib/python3.7/xml/etree/ElementTree.py", line 1269, in feed
    self._parser.feed(data)
xml.etree.ElementTree.ParseError: mismatched tag: line 27, column 4
nextgenusfs commented 1 year ago

I'd also install the latest in master as there are quite a few bug fixes and we haven't tagged a new release in a long time, you can do that with pip from that environment.

python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps
ZeweiSong commented 1 year ago

After several tryings, here is what works for me, thanks for all the help, Jon!

mamba create -n funannotate funannotate diamond=2.0.8 biopython=1.76
conda activate funannotate
# Download Augustus 3.3.3
wget https://github.com/Gaius-Augustus/Augustus/releases/download/v3.3.3/augustus-3.3.3.tar.gz
tar zxvf augustus-3.3.3.tar.gz
cd augustus-3.3.3/
make
# Then check the path of augustus in the conda env
which augustus
to/the/path/of/conda/env/bin/augustus
# Replace the conda augustus with the local make one
cp bin/* to/the/path/of/conda/env/bin/

# Optionally you can update funannotate to the latest master, but the release version also works so far:
python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps
mjacksonhill commented 1 year ago

I think I've found the mismatched tag. I'm not very familiar with XML so my terminology may be off here, but it appears in the XMLs that cause this error, there is no closing </Hit_hsps> tag. I've attached excerpts of one instance from an old (working) version and the new (breaking) version. I'm not sure about the rest of the syntax, but the absence or misplacement of this tag is the sole difference between the formats.

As a potential root of the issue, or just an added weirdness, an initial </Hit_hsps> tag has jumped up to line 27 (agreeing w/ the error), before any occurrences of an opening tag. I've attached the full XML output of funannotate test -t predict for that as well. I've opened an issue for this over at diamond as well.

xmls.zip

mjacksonhill commented 1 year ago

not sure how practical it is, but it looks like one misplaced tag can be corrected with a little regex, shown here in perl. I just ran this on the full XML file slurped in.

$full_text =~ s|  </Hit_hsps>\n||g; #remove all existing (misplaced) tags
$full_text =~ s|</Hsp>\n|</Hsp>\n  </Hit_hsps>\n|g; #put tags back in proper positions