nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

about rmblastn and fasta! #327

Closed sunnycqcn closed 4 years ago

sunnycqcn commented 4 years ago

Hello I installed all packages, except rmblastn and fasta. I tried lots of times, I still get the version as following. In fact I have installed rmblastn and fasta in conda. Could you help me check what I can fix them? Thanks, Fuyou

(funannotate) [fuf@biocluster share]$ funannotate check --show-versions

Checking dependencies for funannotate v1.6.0-401c258

You are running Python v 2.7.15. Now checking python packages... biopython: 1.74 goatools: 0.9.7 matplotlib: 2.2.4 natsort: 6.0.0 numpy: 1.16.4 pandas: 0.24.2 psutil: 5.6.3 requests: 2.22.0 scikit-learn: 0.20.4 scipy: 1.2.1 seaborn: 0.9.0 All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.50 Clone: 0.43 DBD::SQLite: 1.64 DBD::mysql: 4.050 DBI: 1.642 DB_File: 1.852 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.51 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking external dependencies... rmblastn: symbol lookup error: /home/AAFC-AAC/fuf/miniconda3/bin/../lib/ncbi-blast+/libxcleanup.so: undefined symbol: _ZN4ncbi7objects4edit12GetNewProtIdENS0_14CBioseq_HandleERiRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb CodingQuarry: 2.0 RepeatMasker: RepeatMasker 4.0.9 RepeatModeler: RepeatModeler version DEV Trinity: Trinity version: v2.1.1 augustus: 3.2.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.29.0 blat: BLAT v36 diamond: diamond 0.9.24 emapper.py: 1.0.3-40-g41a8498 ete3: 3.1.1 exonerate: exonerate 2.4.0 glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.38 hisat2: 2.1.0 hmmscan: HMMER 3.2.1 (June 2018) hmmsearch: HMMER 3.2.1 (June 2018) java: 11.0.1 kallisto: 0.46.0 mafft: v7.407 (2018/Jul/23) makeblastdb: makeblastdb 2.9.0+ minimap2: 2.17-r941 nucmer: 3.1 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.9 snap: 2006-07-28 stringtie: 2.0 tRNAscan-SE: 2.0.3 (April 2019) tbl2asn: unknown, likely 25.3 tblastn: tblastn 2.9.0+ trimal: trimAl v1.4.rev15 build[2013-12-17] ERROR: fasta not installed ERROR: rmblastn not installed Checking Environmental Variables... $FUNANNOTATE_DB=/isilon/saskatoon-rdc/users/fuf/database/funDB $PASAHOME=/home/AAFC-AAC/fuf/miniconda3/envs/funannotate/opt/pasa-2.3.3 $TRINITYHOME=/home/AAFC-AAC/fuf/miniconda3/envs/funannotate/opt/trinity-2.6.6 $EVM_HOME=/home/AAFC-AAC/fuf/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/AAFC-AAC/fuf/augustus-3.2.3/config $GENEMARK_PATH=/home/AAFC-AAC/fuf/genemark-et/gmes_petap $BAMTOOLS_PATH=/home/AAFC-AAC/fuf/miniconda3/pkgs/bamtools-2.5.1-he860b03_5/bin/ All 7 environmental variables are set

sunnycqcn commented 4 years ago

Meanwhile, I met the error is as following when I run funanotate. I can successfully run funanotate mask. Thanks, Fuyou

[10:50 AM]: OS: linux2, 80 cores, ~ 1057 GB RAM. Python: 2.7.15 [10:50 AM]: Running funannotate v1.6.0-401c258 [10:50 AM]: Missing Dependencies: etraining. Please install missing dependencies and re-run script

nextgenusfs commented 4 years ago

fasta is a dependency of PASA -- its actually fasta36 that needs to be symlinked as fasta. You have a compliation problem with rmblastn which you can see in the results above -- find different version on conda or compile from source to fix. etraining is from augustus package, this indicates your augustus install is not complete, re-install augustus and make sure 'etraining' is available in your PATH.

sunnycqcn commented 4 years ago

Hello, Thanks for your fast replying. I am doing according to your suggestions. By the way, if the species name is not the list, what should I set?

nextgenusfs commented 4 years ago

You should pass the closest species available from funannotate species to the --busco_seed_species option. This will run BUSCO using that seed species and train Augustus for your species -- it will then save those results so you can use them in the future. If you have RNA-seq data -- then run funannotate train followed by predict -- which will then use the RNA-seq data to train Augustus, etc.

sunnycqcn commented 4 years ago

Thank you very much! It is working well currently. Fuyou

sunnycqcn commented 4 years ago

Hello, I am sorry to bother you again. My run almost is good. However, I still met a error at the last step. I checked all dependencies. It should be goo.

[12:27 PM]: OS: linux2, 80 cores, ~ 1057 GB RAM. Python: 2.7.15 [12:27 PM]: Running funannotate v1.6.0-401c258 [12:27 PM]: AUGUSTUS (3.2.3) detected, version seems to be compatible with BRAKER and BUSCO [12:27 PM]: Loading genome assembly and parsing soft-masked repetitive sequences [12:27 PM]: Genome loaded: 20 scaffolds; 44,992,741 bp; 11.48% repeats masked [12:27 PM]: Existing transcript alignments found: BL06/predict_misc/transcript_alignments.gff3 [12:27 PM]: Existing protein alignments found: BL06/predict_misc/protein_alignments.gff3 [12:28 PM]: Running GeneMark-ES on assembly [12:28 PM]: GeneMark-ES failed: BL06/predict_misc/genemark/output/gmhmm.mod file missing, please check logfiles. [12:28 PM]: Running BUSCO to find conserved gene models for training Augustus [12:28 PM]: Multi-threading in tblastn v2.9.0 is unstable, running in single threaded mode for BUSCO [12:49 PM]: 1,270 valid BUSCO predictions found, now formatting for EVM [12:50 PM]: Setting up EVM partitions [12:51 PM]: Generating EVM command list [12:51 PM]: Running EVM commands with 19 CPUs [12:52 PM]: Combining partitioned EVM outputs [12:52 PM]: Converting EVM output to GFF3 [12:52 PM]: Collecting all EVM results [12:52 PM]: 1,253 total gene models from EVM [12:52 PM]: Checking BUSCO protein models for accuracy [12:53 PM]: 1,251 BUSCO predictions validated [12:53 PM]: Training Augustus using BUSCO gene models [12:53 PM]: Augustus initial training results: [12:53 PM]: Running Augustus gene prediction [01:01 PM]: Found 10,056 gene models [01:01 PM]: GeneMark predictions failed. If you can run GeneMark outside of funannotate, then pass the results to --genemark_gtf. [01:01 PM]: Pulling out high quality Augustus predictions [01:01 PM]: Found 6,929 high quality predictions from Augustus (>90% exon evidence) [01:01 PM]: Skipping CodingQuarry as there are no RNA-seq data [01:01 PM]: Running SNAP gene prediction, using training data: BL06/predict_misc/busco.final.gff3 [01:07 PM]: 0 predictions from SNAP [01:07 PM]: SNAP prediction failed, moving on without result [01:07 PM]: Running GlimmerHMM gene prediction, using training data: BL06/predict_misc/busco.final.gff3 [01:23 PM]: 9,457 predictions from GlimmerHMM [01:23 PM]: Summary of gene models passed to EVM (weights): [01:23 PM]: Setting up EVM partitions [01:24 PM]: Generating EVM command list [01:24 PM]: Running EVM commands with 19 CPUs [01:29 PM]: Combining partitioned EVM outputs [01:30 PM]: Converting EVM output to GFF3 [01:30 PM]: Collecting all EVM results [01:30 PM]: 10,402 total gene models from EVM [01:30 PM]: Generating protein fasta files from 10,402 EVM models [01:30 PM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). Traceback (most recent call last): File "/home/AAFC-AAC/fuf/funannotate/bin/funannotate-predict.py", line 1336, in lib.RepeatBlast(EVM_proteins, args.cpus, 1e-10, FUNDB, os.path.join(args.out, 'predict_misc'), Blast_rep_remove) File "/home/AAFC-AAC/fuf/funannotate/lib/library.py", line 4212, in RepeatBlast with open(blast_tmp, 'rU') as results: IOError: [Errno 2] No such file or directory: 'BL06/predict_misc/repeats.xml'


Checking dependencies for funannotate v1.6.0-401c258

You are running Python v 2.7.15. Now checking python packages... biopython: 1.74 goatools: 0.9.7 matplotlib: 2.2.4 natsort: 6.0.0 numpy: 1.16.4 pandas: 0.24.2 psutil: 5.6.3 requests: 2.22.0 scikit-learn: 0.20.4 scipy: 1.2.1 seaborn: 0.9.0 All 11 python packages installed

You are running Perl v 5.026002. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.50 Clone: 0.43 DBD::SQLite: 1.64 DBD::mysql: 4.050 DBI: 1.642 DB_File: 1.852 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.51 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking external dependencies... CodingQuarry: 2.0 RepeatMasker: RepeatMasker 4.0.9 RepeatModeler: RepeatModeler version DEV Trinity: Trinity version: v2.1.1 augustus: 3.2.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.29.0 blat: BLAT v36 diamond: diamond 0.9.24 emapper.py: 1.0.3-40-g41a8498 ete3: 3.1.1 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 gmes_petap.pl: 4.38 hisat2: 2.1.0 hmmscan: HMMER 3.2.1 (June 2018) hmmsearch: HMMER 3.2.1 (June 2018) java: 11.0.1 kallisto: 0.46.0 mafft: v7.407 (2018/Jul/23) makeblastdb: makeblastdb 2.9.0+ minimap2: 2.17-r941 nucmer: 3.1 pslCDnaFilter: no way to determine rmblastn: rmblastn 2.9.0+ salmon: salmon 0.14.1 samtools: samtools 1.9 snap: 2006-07-28 stringtie: 2.0 tRNAscan-SE: 2.0.3 (April 2019) tbl2asn: unknown, likely 25.3 tblastn: tblastn 2.9.0+ trimal: trimAl v1.4.rev15 build[2013-12-17] All 35 external dependencies are installed

Checking Environmental Variables... $FUNANNOTATE_DB=/isilon/saskatoon-rdc/users/fuf/database/funDB $PASAHOME=/home/AAFC-AAC/fuf/miniconda3/envs/funannotate/opt/pasa-2.3.3 $TRINITYHOME=/home/AAFC-AAC/fuf/miniconda3/envs/funannotate/opt/trinity-2.6.6 $EVM_HOME=/home/AAFC-AAC/fuf/miniconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/AAFC-AAC/fuf/augustus-3.2.3/config $GENEMARK_PATH=/home/AAFC-AAC/fuf/genemark-et/gmes_petap $BAMTOOLS_PATH=/home/AAFC-AAC/fuf/miniconda3/pkgs/bamtools-2.5.1-he860b03_5/bin/ All 7 environmental variables are set

nextgenusfs commented 4 years ago

Did you install databases? What is output of funannotate database

sunnycqcn commented 4 years ago

Hi, Run funannotate database as following. Thanks, (funannotate) [fuf@biocluster RepeatModeler]$ funannotate database

Funannotate Databases currently installed:

Database Type Version Date Num_Records Md5checksum pfam hmmer3 32.0 2018-08 17929 de7496fad69c1040fd74db1cb5eef0fc gene2product text 1.45 2019-07-31 30103 657bb30cf3247fcb74ca4f51a4ab7c18 interpro xml 75.0 2019-07-04 36872 6e2b1e1d447c3e0bfaa949f77d9f488c dbCAN hmmer3 8.0 2019-08-08 607 51c724c1f9ac45687f08d0faa689ed58 busco_outgroups outgroups 1.0 2019-09-17 8 6795b1d4545850a4226829c7ae8ef058 merops diamond 12.0 2017-10-04 5009 a6dd76907896708f3ca5335f58560356 mibig diamond 1.4 2019-09-16 31023 118f2c11edde36c81bdea030a0228492 uniprot diamond 2019_07 2019-07-31 560537 ff745121ab32b328586928ec5cd7bb84 go text 2019-07-01 2019-07-01 47401 27a6a2d4a1036df99c81a1245cb16279 repeats diamond 1.0 2019-09-16 11950 4e8cafc3eea47ec7ba505bb1e3465d21

To update a database type: funannotate setup -i DBNAME -d /isilon/saskatoon-rdc/users/fuf/database/funDB --force

nextgenusfs commented 4 years ago

Okay, how about logfile? It looks like diamond search of the repeats database failed -- hopefully it tells you why in the logfile.

sunnycqcn commented 4 years ago

In log file, I get the error is No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'. Thanks, Fuyou

sunnycqcn commented 4 years ago

CPU threads: 20

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: BL06/predict_misc Opening the database... [0.033483s] Error: Database was built with a different version of Diamond and is incompatible. So I need update the database?

nextgenusfs commented 4 years ago

Yeah, rerun funannotate setup and use -f.

sunnycqcn commented 4 years ago

Hello, I am much appreciated for your suggestions. It is working well. Have a good weekend, Fuyou