nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
299 stars 82 forks source link

funannotate predict stuck #810

Open xieyichun50 opened 1 year ago

xieyichun50 commented 1 year ago

Hi @nextgenusfs, I was running the funannotate predict and found my script cannot went through. So I tried funannotate test -t busco and funannotate test -t predict, and waited for one hour. (htop showed that the scripts stuck within 5 sec after it started and they hang there). Tests on other funannotate functions can go through smoothly. Do you have any suggestion on this?

The log file looks like this:

funannotate test -t busco --cpu 70
#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 70 --species Awesome busco
#########################################################
-------------------------------------------------------
[Oct 17 03:51 AM]: OS: Ubuntu 20.04, 80 cores, ~ 528 GB RAM. Python: 3.8.13
[Oct 17 03:51 AM]: Running funannotate v1.8.13
funannotate test -t predict --cpu 70
#########################################################
Running `funannotate predict` unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 70 --species Awesome testicus
#########################################################
-------------------------------------------------------
[Oct 17 04:07 AM]: OS: Ubuntu 20.04, 80 cores, ~ 528 GB RAM. Python: 3.8.13
[Oct 17 04:07 AM]: Running funannotate v1.8.13

The output looks like this, and the required test files can be accessed.

tree test-busco_d2030ea5-7dd8-4264-a975-ffa1871bcfe0
test-busco_d2030ea5-7dd8-4264-a975-ffa1871bcfe0
├── annotate
│   ├── logfiles
│   │   └── funannotate-predict.log
│   ├── predict_misc
│   └── predict_results
├── protein.evidence.fasta
└── test.softmasked.fa

4 directories, 3 files
tree test-predict_e3613a0f-8ce1-46c9-b3e9-4d5ac7c54ac9/
test-predict_e3613a0f-8ce1-46c9-b3e9-4d5ac7c54ac9/
├── annotate
│   ├── logfiles
│   │   └── funannotate-predict.log
│   ├── predict_misc
│   └── predict_results
├── protein.evidence.fasta
└── test.softmasked.fa

4 directories, 3 files

and I am running funannotate under a conda environment on a ubuntu 20 machine.

funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.13
-------------------------------------------------------
You are running Python v 3.8.13. Now checking python packages...
biopython: 1.79
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.2.0
numpy: 1.23.3
pandas: 1.5.0
psutil: 5.9.2
requests: 2.28.1
scikit-learn: 1.1.2
scipy: 1.9.1
seaborn: 0.12.0
All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.50
Clone: 0.42
DBD::SQLite: 1.70
DBD::mysql: 4.050
DBI: 1.643
DB_File: 1.855
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.24
Getopt::Long: 2.52
Hash::Merge: 0.302
JSON: 4.10
LWP::UserAgent: 6.67
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.12
YAML: 1.30
local::lib: 2.000029
threads: 2.25
threads::shared: 1.61
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/mnt/content_176/yichun/tools/funannotate_db
$PASAHOME=/home/yichun_hml/miniconda3/envs/funannotate/opt/pasa-2.5.2
$TRINITY_HOME=/home/yichun_hml/miniconda3/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/home/yichun_hml/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/yichun_hml/miniconda3/envs/funannotate/config/
$GENEMARK_PATH=/mnt/content_176/yichun/tools/gmes_linux_64_4
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.5.2
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.5.0
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.0.15
emapper.py: 2.1.9
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2021-08-25
gmes_petap.pl: 4.69_lic
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 17.0.3-internal
kallisto: 0.46.1
mafft: v7.508 (2022/Sep/07)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: pigz 2.6
proteinortho: 6.1.1
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.16.1
signalp: 5.0b
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.11 (Oct 2022)
tantan: tantan 39
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
All 37 external dependencies are installed
xieyichun50 commented 1 year ago

And additional message showed after interrupt the run by keyboard

Traceback (most recent call last):
  File "/home/yichun_hml/miniconda3/envs/funannotate/bin/funannotate", line 10, in <module>
    sys.exit(main())
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
    mod.main(arguments)
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 407, in main
    runBuscoTest(args)
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 193, in runBuscoTest
    runCMD(['funannotate', 'predict', '-i', inputFasta,
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/test.py", line 55, in runCMD
    subprocess.call(cmd, cwd=dir)
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/subprocess.py", line 342, in call
    return p.wait(timeout=timeout)
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/subprocess.py", line 1083, in wait
    return self._wait(timeout=timeout)
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/subprocess.py", line 1806, in _wait
    (pid, sts) = self._try_wait(0)
  File "/home/yichun_hml/miniconda3/envs/funannotate/lib/python3.8/subprocess.py", line 1764, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt
hyphaltip commented 1 year ago

I can confirm this as well in my running of test off master branch.

xieyichun50 commented 1 year ago

Hi all, I finally come out with a solution! This can be a bug from the conda Augustus. Do conda remove --force-remove augustus, and install the augustus with ubuntu apt-get install augustus or other source except conda, then do export AUGUSTUS_CONFIG_PATH=/your/path/to/augustus/config. The bug can be fixed. So far it works on my machine.

funannotate test -t busco --cpus 70
#########################################################
Running `funannotate predict` BUSCO-mediated training unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 70 --species Awesome busco
#########################################################
-------------------------------------------------------
[Oct 17 01:42 PM]: OS: Ubuntu 20.04, 80 cores, ~ 528 GB RAM. Python: 3.8.13
[Oct 17 01:42 PM]: Running funannotate v1.8.13
[Oct 17 01:42 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 17 01:42 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     busco
  genemark     selftraining
  glimmerhmm   busco
  snap         busco
[Oct 17 01:42 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 17 01:42 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 17 01:42 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 17 01:42 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 17 01:42 PM]: Exonerate finished in 0:00:09: found 1,270 alignments
[Oct 17 01:42 PM]: Running GeneMark-ES on assembly
[Oct 17 01:43 PM]: 1,565 predictions from GeneMark
[Oct 17 01:43 PM]: Running BUSCO to find conserved gene models for training ab-initio predictors
[Oct 17 01:45 PM]: 373 valid BUSCO predictions found, validating protein sequences
[Oct 17 01:46 PM]: 370 BUSCO predictions validated
[Oct 17 01:46 PM]: Training Augustus using BUSCO gene models
[Oct 17 01:46 PM]: Augustus initial training results:
  Feature       Specificity   Sensitivity
  nucleotides   99.5%         83.8%
  exons         71.8%         59.7%
  genes         86.5%         59.3%
[Oct 17 01:46 PM]: Running Augustus gene prediction using awesome_busco parameters
[Oct 17 01:46 PM]: 1,301 predictions from Augustus
[Oct 17 01:46 PM]: Pulling out high quality Augustus predictions
[Oct 17 01:46 PM]: Found 313 high quality predictions from Augustus (>90% exon evidence)
[Oct 17 01:46 PM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 17 01:46 PM]: 1,423 predictions from SNAP
[Oct 17 01:46 PM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3
[Oct 17 01:48 PM]: 1,771 predictions from GlimmerHMM
[Oct 17 01:48 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        988
  Augustus HiQ   2        313
  GeneMark       1        1565
  GlimmerHMM     1        1771
  snap           1        1423
  Total          -        6060
[Oct 17 01:48 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
[Oct 17 01:52 PM]: Converting to GFF3 and collecting all EVM results
[Oct 17 01:52 PM]: 1,693 total gene models from EVM
[Oct 17 01:52 PM]: Generating protein fasta files from 1,693 EVM models
[Oct 17 01:52 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 17 01:52 PM]: Found 104 gene models to remove: 0 too short; 0 span gaps; 104 transposable elements
[Oct 17 01:52 PM]: 1,589 gene models remaining
[Oct 17 01:52 PM]: Predicting tRNAs
[Oct 17 01:53 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 17 01:53 PM]: Generating GenBank tbl annotation file
[Oct 17 01:53 PM]: Collecting final annotation files for 1,701 total gene models
[Oct 17 01:53 PM]: Converting to final Genbank format
[Oct 17 01:53 PM]: Funannotate predict is finished, output files are in the annotate/predict_results folder
[Oct 17 01:53 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install):
funannotate iprscan -i annotate -c 70

Run antiSMASH (optional):
funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome:
funannotate annotate -i annotate --cpus 70 --sbt yourSBTfile.txt
-------------------------------------------------------

[Oct 17 01:53 PM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json
[Oct 17 01:53 PM]: Add species parameters to database:

  funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json

#########################################################
SUCCESS: `funannotate predict` BUSCO-mediated training test complete.
#########################################################
Now running predict using all pre-trained ab-initio predictors
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 70 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json
#########################################################
-------------------------------------------------------
[Oct 17 01:53 PM]: OS: Ubuntu 20.04, 80 cores, ~ 528 GB RAM. Python: 3.8.13
[Oct 17 01:53 PM]: Running funannotate v1.8.13
[Oct 17 01:53 PM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json
[Oct 17 01:53 PM]: Skipping CodingQuarry as no --rna_bam passed
[Oct 17 01:53 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pretrained
  genemark     pretrained
  glimmerhmm   pretrained
  snap         pretrained
[Oct 17 01:53 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Oct 17 01:53 PM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked
[Oct 17 01:53 PM]: Mapping 1,065 proteins to genome using diamond and exonerate
[Oct 17 01:53 PM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00
[Oct 17 01:53 PM]: Exonerate finished in 0:00:09: found 1,270 alignments
[Oct 17 01:53 PM]: Running GeneMark-ES on assembly
[Oct 17 01:54 PM]: 1,566 predictions from GeneMark
[Oct 17 01:54 PM]: Running Augustus gene prediction using awesome_busco parameters
[Oct 17 01:55 PM]: 1,301 predictions from Augustus
[Oct 17 01:55 PM]: Pulling out high quality Augustus predictions
[Oct 17 01:55 PM]: Found 313 high quality predictions from Augustus (>90% exon evidence)
[Oct 17 01:55 PM]: Running SNAP gene prediction, using pre-trained HMM profile
[Oct 17 01:55 PM]: 1,423 predictions from SNAP
[Oct 17 01:55 PM]: Running GlimmerHMM gene prediction, using pretrained HMM profile
[Oct 17 01:55 PM]: 1,771 predictions from GlimmerHMM
[Oct 17 01:55 PM]: Summary of gene models passed to EVM (weights):
  Source         Weight   Count
  Augustus       1        988
  Augustus HiQ   2        313
  GeneMark       1        1566
  GlimmerHMM     1        1771
  snap           1        1423
  Total          -        6061
[Oct 17 01:55 PM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval
     Progress: 97.44%
[Oct 17 01:59 PM]: Converting to GFF3 and collecting all EVM results
[Oct 17 01:59 PM]: 1,694 total gene models from EVM
[Oct 17 01:59 PM]: Generating protein fasta files from 1,694 EVM models
[Oct 17 01:59 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc).
[Oct 17 01:59 PM]: Found 104 gene models to remove: 0 too short; 0 span gaps; 104 transposable elements
[Oct 17 01:59 PM]: 1,590 gene models remaining
[Oct 17 01:59 PM]: Predicting tRNAs
[Oct 17 02:00 PM]: 112 tRNAscan models are valid (non-overlapping)
[Oct 17 02:00 PM]: Generating GenBank tbl annotation file
[Oct 17 02:00 PM]: Collecting final annotation files for 1,702 total gene models
[Oct 17 02:00 PM]: Converting to final Genbank format
[Oct 17 02:00 PM]: Funannotate predict is finished, output files are in the annotate2/predict_results folder
[Oct 17 02:00 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (manual install):
funannotate iprscan -i annotate2 -c 70

Run antiSMASH (optional):
funannotate remote -i annotate2 -m antismash -e youremail@server.edu

Annotate Genome:
funannotate annotate -i annotate2 --cpus 70 --sbt yourSBTfile.txt
-------------------------------------------------------

[Oct 17 02:00 PM]: Training parameters file saved: annotate2/predict_results/awesome_busco.parameters.json
[Oct 17 02:00 PM]: Add species parameters to database:

  funannotate species -s awesome_busco -a annotate2/predict_results/awesome_busco.parameters.json

#########################################################
SUCCESS: `funannotate predict` using existing parameters test complete.
#########################################################
hyphaltip commented 1 year ago

Can you clarify which version of Augustus you removed. I just built and released the 3.5 version into bioconda to fix a problem with pbl parsing so if this release has a bug we need to know. Generally things have been buggy with some or the Augustus releases after 3.3 so forcing an older version may fix as you see.

hyphaltip commented 1 year ago

But I'll see if this is just a bug in the test suite since it seems to run otherwise...

xieyichun50 commented 1 year ago

Can you clarify which version of Augustus you removed. I just built and released the 3.5 version into bioconda to fix a problem with pbl parsing so if this release has a bug we need to know. Generally things have been buggy with some or the Augustus releases after 3.3 so forcing an older version may fix as you see.

I have tried and removed the conda Augustus 3.3.3 and 3.5, as they were both not working. The current running version get from ubuntu apt-get is Augustus 3.3.3. Thank you!

nextgenusfs commented 1 year ago

Okay, so the issue here is that in augustus v3.5.0 cmd line options in some of the auxiliary scripts have changed, thus funannotate freezes when trying to validate the install. I think c0fab96304e1ed8b1740cc712e36f3045952233b should fix it. Note there will likely be more issues with augustus v3.5.0 that I haven't gotten to yet......

Priyam008 commented 1 year ago

Hi @nextgenusfs, I have installed and run the funannotate. It run smoothly till predict option afterwards funannotate predict stuck for so long time without giving any error or output file. So I tried funannotate test -t predict, and waited for more than one hour. I already did all the suggestion which was posted here related to installation of Augustus without conda and installation of tbl2asn from ncbi repository. Please resolve the issue. The log file looks like this: (funannotate) [root@localhost funannotate]# funannotate test -t predict --cpus 30 ######################################################### Running funannotate predict unit testing Downloading: https://osf.io/te2pf/download?version=1 Bytes: 1489808 CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 30 --species Awesome testicus #########################################################

[Oct 20 02:19 PM]: OS: Red Hat Enterprise Linux Server 7.3, 32 cores, ~ 132 GB RAM. Python: 3.8.13 [Oct 20 02:19 PM]: Running funannotate v1.8.13

margaretc-ho commented 1 year ago

I ran into this same error and keeping the conda install of Augustus 3.5.5 was fine as long as I updated funannotate to the latest master python -m pip install git+https://github.com/nextgenusfs/funannotate.git

hyphaltip commented 1 year ago

augustus > 3.3 is not working on older versions of funannotate (eg v1.18.13) so downgrade augustus

SLAment commented 1 year ago

Hi everyone, I seem to be running into the same issue. I tried @margaretc-ho suggestion of using python -m pip but it didn't work for me. Is there any update on this?

SLAment commented 1 year ago

As a follow-up to my comment above, the latest conda funannotate version (1.8.15) now contains Augustus 3.3.0, and I didn't get this freezing issue during the predict test (although I got another issue, see #827).