Closed charlesfeigin closed 1 year ago
Uninstall augustus from your conda environment and use the one packaged for Debian, ie install with apt-get
if possible. You can see how I did it in the Dockerfile https://github.com/nextgenusfs/funannotate/blob/master/Dockerfile#L42. I know its a huge pain, its not really a funannotate problem but rather augustus is really finicky and quite a pain to get working properly. I think (but cannot guarantee because I haven't had enough time to test) but the code in master should be compatible with Augustus v3.4 and v3.5. So you can also upgrade your funannotate install by installing from master in that environment, ie something like python -m pip install git+https://github.com/nextgenusfs/funannotate.git --upgrade --force --no-deps
.
Thanks so much for your fast reply. When installing Augustus via apt, do you know where the system tends to put the augustus config directory? I figure I would need to set this in order to use the one in my path. It also doesn't seem to make the perl scripts available when I install via apt. Apologies...
Ok, sorry, I did find the location (/usr/share/augustus/config).
The program shows the apt-get version (3.4) when I run funannotate check --show-versions and again I am confident in having removed the default version from my environment (3.5) with only the apt-get version (3.4) remaining, but I still get the same error.
The weird thing is that there is a server at my old uni that has a functioning install of funannotate. I can't use this for my analyses, but I logged back in to check the file in question: EOG092C0B3U.prfl
The md5sums for the one associated with the working install on the other server and on the server where I'm currently trying to install are identical. I think that means it can't be an issue with compiling this file.
I also tried upgrading the install to funannotate v1.8.14 with the pip command you suggest and the rebuilt the database. Again though, this leads to the same outcome.
I'm wondering if I can get past this issue by simply replacing the "EOG092C0B3U.prfl" file on my system with one that is properly formatted to work with funannotate v1.8.14 and augustus 3.4 installed via apt-get. Do you think this would work?
Okay, so you are getting the protein profile compilation error even from Augustus installed from apt-get? I think you need both augustus augustus-data
from apt-get to get the extra files, configs, etc. You can just use which augustus
from the funannotate environment to ensure it has the appropriate path. You can also move the config data elsewhere by just copying and then setting export AUGUSTUS_CONFIG_PATH=/path/to/writable/config
.
I don't know what else to recommend other than to find a way to build augustus where the protein profile mode is functional. I thought it was solved in conda eventually, but I don't know the exact version that was fixed.
you can try mamba install -c bioconda augustus=3.4
Hi @nextgenusfs, I'm not 100% sure what I did differently but I deleted everything, started fresh and the following wound up working:
1) Create a conda environment in which to install mamba (can't install mamba in base for some reason) 2) Create a mamba environment for funannotate 3) Install funannotate with mamba 4) This time leave augustus 3.5 in place (which previously was freezing at the "predict" step, my initial issue that sent me down this path) 5) Using pip to upgrade to 1.8.14 6) Remake the funannotate database
From here it made it past predict without Augustus failing and without the process stopping. I felt like I had done this previously, but I may have mixed up databases or something. Its easy to loose track since they take >30 min each to build.
The process got much further this time but crashed when it got to processing RNA data:
Feb 09 01:15 AM: Reannotating Awesome rna, NCBI accession: None Feb 09 01:15 AM: Previous annotation consists of: 1,591 protein coding gene models and 112 non-coding gene models Feb 09 01:15 AM: Existing annotation: locustag=FUN_ genenumber=1703 Feb 09 01:15 AM: Building Hisat2 genome index Feb 09 01:15 AM: CMD ERROR: hisat2-build -p 200 rna-seq/update_misc/genome.fa rna-seq/update_misc/hisat2.genome Feb 09 01:15 AM: Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples
Output files: "rna-seq/update_misc/hisat2.genome..ht2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344 Local sequence overlap between two consecutive indexes: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: rna-seq/update_misc/genome.fa Reading reference sizes Time reading reference sizes: 00:00:00 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:00 Time to read SNPs and splice sites: 00:00:00 Using parameters --bmax 3540 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 3540 --dcv 1024 Constructing suffix-array element generator
Feb 09 01:15 AM: ERROR: Trinity de novo assembly failed
#########################################################
Traceback (most recent call last):
File "/home/cfeigin/anaconda3_2022.10/envs/mamba_base/envs/funannotate3/bin/funannotate", line 8, in
Any idea what may cause this? Thank you again so much for your assistance. I really do appreciate it.
Related to the above, here is the current output of funannotate check
You are running Python v 3.8.15. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.24.2 pandas: 1.5.3 psutil: 5.9.4 requests: 2.28.2 scikit-learn: 1.2.1 scipy: 1.10.0 seaborn: 0.12.2 All 11 python packages installed
You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed
Checking external dependencies... ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory
PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.515 (2023/Jan/15) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.6 proteinortho: 6.1.7 salmon: salmon 0.14.1 samtools: samtools 1.16.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: pslCDnaFilter not installed ERROR: signalp not installed
Good news! So that suggests that current codebase is running okay with augustus v3.5, which is nice to hear.
Per this error, I'm not exactly sure -- but I'm wondering if this is too many threads (200)?
CMD ERROR: hisat2-build -p 200 rna-seq/update_misc/genome.fa rna-seq/update_misc/hisat2.genome
Would be possible to try with fewer cpus/threads and see if it gets passed that step, ie maybe something like 24. I've never had a machine big enough to try that many ;). The test dataset is very small, it really doesn't use many resources, so something like 24 cpus should run fairly quick.
That did it! I got too greedy with the CPU. I'll play around with a few more tests to see where that limit is.
If there's any info you'd like from me that could help with future compatibility/installation questions please don't hesitate to ask. I really appreciate your help.
If you could just paste the output of funannotate test -t busco --cpus 12
here that would help, just want to make sure Augustus v3.5 from conda is working and then can tag a new release and pin Augustus v3.5 so hopefully fixes most people's install issues.
Here it is, let me know if you need anything else:
Run InterProScan (manual install): funannotate iprscan -i annotate -c 12
Run antiSMASH (optional): funannotate remote -i annotate -m antismash -e youremail@server.edu
[Feb 10 11:17 AM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json [Feb 10 11:17 AM]: Add species parameters to database:
funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json
[Feb 10 11:17 AM]: OS: Ubuntu 22.04, 255 cores, ~ 2015 GB RAM. Python: 3.8.15 [Feb 10 11:17 AM]: Running funannotate v1.8.14 [Feb 10 11:17 AM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json [Feb 10 11:17 AM]: Skipping CodingQuarry as no --rna_bam passed [Feb 10 11:17 AM]: Parsed training data, run ab-initio gene predictors as follows: ESC[4mProgram Training-MethodESC[0m augustus pretrained genemark pretrained glimmerhmm pretrained snap pretrained [Feb 10 11:18 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Feb 10 11:18 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Feb 10 11:18 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Feb 10 11:18 AM]: Found 1,505 preliminary alignments with diamond in 0:00:01 --> generated FASTA files for exonerate in 0:00:00 [Feb 10 11:18 AM]: Exonerate finished in 0:00:14: found 1,270 alignments [Feb 10 11:18 AM]: Running GeneMark-ES on assembly [Feb 10 11:19 AM]: 1,565 predictions from GeneMark [Feb 10 11:19 AM]: Running Augustus gene prediction using awesome_busco parameters [Feb 10 11:19 AM]: 1,284 predictions from Augustus [Feb 10 11:20 AM]: Pulling out high quality Augustus predictions [Feb 10 11:20 AM]: Found 306 high quality predictions from Augustus (>90% exon evidence) [Feb 10 11:20 AM]: Running SNAP gene prediction, using pre-trained HMM profile [Feb 10 11:20 AM]: 1,392 predictions from SNAP [Feb 10 11:20 AM]: Running GlimmerHMM gene prediction, using pretrained HMM profile [Feb 10 11:20 AM]: 1,778 predictions from GlimmerHMM [Feb 10 11:20 AM]: Summary of gene models passed to EVM (weights): [Feb 10 11:20 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Feb 10 11:24 AM]: Converting to GFF3 and collecting all EVM results ESC[4mSource Weight CountESC[0m Augustus 1 978 Augustus HiQ 2 306 GeneMark 1 1565 GlimmerHMM 1 1778 snap 1 1392 Total - 6019
Run InterProScan (manual install): funannotate iprscan -i annotate2 -c 12
Run antiSMASH (optional): funannotate remote -i annotate2 -m antismash -e youremail@server.edu
[Feb 10 11:24 AM]: Training parameters file saved: annotate2/predict_results/awesome_busco.parameters.json [Feb 10 11:24 AM]: Add species parameters to database:
funannotate species -s awesome_busco -a annotate2/predict_results/awesome_busco.parameters.json
Hi,
I am trying to install funannotate on a server running ubuntu. Our institution doesn't allow us to use docker for "security reasons" and the conda install hangs indefinitely, so I went the mamba route. This installation appears to go smoothly, but when I run my test I get the following error:
... [Feb 08 03:15 PM]: ERROR: augustus --proteinprofile test failed, likely a compilation error. This is required to run BUSCO, exiting. augustus: ERROR PP::Profile: Error parsing pattern file"/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/lib/python3.8/site-packages/funannotate/config/EOG092C0B3U.prfl", line 8. CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 8 --species Awesome testicus ######################################################### ######################################################### Traceback (most recent call last): File "/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/bin/funannotate", line 10, in
sys.exit(main())
File "/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main
mod.main(arguments)
File "/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/lib/python3.8/site-packages/funannotate/test.py", line 405, in main
runPredictTest(args)
File "/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/lib/python3.8/site-packages/funannotate/test.py", line 160, in runPredictTest
assert 1500 <= countGFFgenes(os.path.join(
File "/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/lib/python3.8/site-packages/funannotate/test.py", line 45, in countGFFgenes
with open(input, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_2180e07e-d5ba-4965-b1fb-5aa3a7acd720/annotate/predict_results/Awesome_testicus.gff3'
I had tried initially with the default install through mamba and had issues with augustus 3.5 (which installs automatically this way). Apt only provides 3.4 which other questioners indicated was too high a version still, so I compiled both 3.2.1 and 3.3.3 from source to no avail. I continue to get the same error. I am also confident that I've successfully removed augustus 3.5 completely from my environment, the augustus environment variables are correct and that version 3.3.3 is what is currently in my path. I've tried fixes other people report but am not making any progress. Any help would be immensely appreciated as I've been tasked with annotating several assemblies. Here is the current info on my install:
Checking dependencies for 1.8.13
You are running Python v 3.8.15. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.24.2 pandas: 1.5.3 psutil: 5.9.4 requests: 2.28.2 scikit-learn: 1.2.1 scipy: 1.10.0 seaborn: 0.12.2 All 11 python packages installed
You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed
Checking Environmental Variables... $FUNANNOTATE_DB=/home/cfeigin/funannotate_v1.8.13_db $PASAHOME=/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/opt/pasa-2.5.2 $TRINITY_HOME=/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/opt/trinity-2.8.5 $EVM_HOME=/home/cfeigin/anaconda3_2022.10/envs/mamba_base_env1/envs/funannotate_env1/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/cfeigin/src/augustus/config/ $GENEMARK_PATH=/home/cfeigin/src/gmes_linux_64_4/ All 6 environmental variables are set
Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 emapper.py: 2.1.9 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.515 (2023/Jan/15) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.1.7 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.16.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: signalp not installed