nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

funannotate test error #675

Open liuyca1 opened 2 years ago

liuyca1 commented 2 years ago

hi, I recently installed funannotate software, and an error was reported during testing. (funannotate) [liuyuanchao@login ~]$ funannotate check --show-versions

Checking dependencies for 1.8.7

You are running Python v 3.9.7. Now checking python packages... biopython: 1.79 goatools: 1.1.6 matplotlib: 3.4.3 natsort: 8.0.0 numpy: 1.21.4 pandas: 1.3.4 psutil: 5.8.0 requests: 2.26.0 scikit-learn: 1.0.1 scipy: 1.7.0 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $PASAHOME=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITY_HOME=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/public/home/liuyuanchao/software/anaconda3/envs/funannotate/config/ ERROR: FUNANNOTATE_DB not set. export FUNANNOTATE_DB=/path/to/dir ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.8 emapper.py: 2.1.3 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.490 (2021/Oct/30) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.22-r1101 proteinortho: 6.0.31 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.9 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 26 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 36 external dependencies are installed

but when we run the commond funannotate test, Some error appeared (funannotate) [liuyuanchao@login /public/home/liuyuanchao/ceshi] $funannotate test -t all --cpus 20 ######################################################### Running funannotate clean unit testing: minimap2 mediated assembly duplications Downloading: https://osf.io/8pjbe/download?version=1 Bytes: 252076
CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive #########################################################

6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs

scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039

6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file ######################################################### SUCCESS: funannotate clean test complete. #########################################################

######################################################### Running funannotate mask unit testing: RepeatModeler --> RepeatMasker Downloading: https://osf.io/hbryz/download?version=1 Bytes: 375687
CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 20 #########################################################

[Dec 07 08:49 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 08:49 AM]: Running funanotate v1.8.7 [Dec 07 08:49 AM]: Soft-masking simple repeats with tantan [Dec 07 08:49 AM]: Repeat soft-masking finished: Masked genome: /public/home/liuyuanchao/ceshi/test-mask_d0755630-7e24-4b9a-90e8-50d205a3cd9f/test.masked.fa num scaffolds: 2 assembly size: 1,216,048 bp masked repeats: 50,965 bp (4.19%)

######################################################### SUCCESS: funannotate mask test complete. #########################################################

######################################################### Running funannotate predict unit testing
CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces --cpus 20 --species Awesome testicus]]]]]]]]]]]]#########################################################

[Dec 07 08:49 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 08:49 AM]: Running funannotate v1.8.7 [Dec 07 08:49 AM]: ERROR: dikarya busco database is not found, install with funannotate setup -b dikarya ######################################################### Traceback (most recent call last): File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 405, in main runPredictTest(args) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 160, in runPredictTest assert 1500 <= countGFFgenes(os.path.join( File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-predict_d0755630-7e24-4b9a-90e8-50d205a3cd9f/annotate/predict_results/Awesome_testicus.gff3'

nextgenusfs commented 2 years ago

Did you try what the error indicates?

Dec 07 08:49 AM]: ERROR: dikarya busco database is not found, install with funannotate setup -b dikarya

liuyca1 commented 2 years ago

Yes, it is indeed a problem with the database. I have specified the location of the database before downloading the database, but I don’t know where the problem occurred. Funannotate failed to identify it. Now it seems that I have to specify the location every time I start it.

nextgenusfs commented 2 years ago

You can either specify with command line or you can set the FUNANNOTATE_DB environmental variable.

On Dec 6, 2021, at 5:52 PM, liuyca1 @.***> wrote:

 Yes, it is indeed a problem with the database. I have specified the location of the database before downloading the database, but I don’t know where the problem occurred. Funannotate failed to identify it. Now it seems that I have to specify the location every time I start it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

liuyca1 commented 2 years ago

The report after running funannotate test completely shows that there are still some errors, These errors can be ignored?

(funannotate) [liuyuanchao@login /public/home/liuyuanchao/ceshi] $funannotate test -t all --cpus 20 ######################################################### Running funannotate clean unit testing: minimap2 mediated assembly duplications CMD: funannotate clean -i test.clean.fa -o test.exhaustive.fa --exhaustive #########################################################

6 input contigs, 6 larger than 500 bp, N50 is 427,039 bp Checking duplication of 6 contigs

scaffold_73 appears duplicated: 100% identity over 100% of the contig. contig length: 15153 scaffold_27 appears duplicated: 100% identity over 100% of the contig. contig length: 427039 scaffold_91 appears duplicated: 100% identity over 100% of the contig. contig length: 8858

6 input contigs; 6 larger than 500 bp; 3 duplicated; 3 written to file ######################################################### SUCCESS: funannotate clean test complete. #########################################################

######################################################### Running funannotate mask unit testing: RepeatModeler --> RepeatMasker CMD: funannotate mask -i test.fa -o test.masked.fa --cpus 20 #########################################################

[Dec 07 09:24 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 09:24 AM]: Running funanotate v1.8.7 [Dec 07 09:24 AM]: Soft-masking simple repeats with tantan [Dec 07 09:24 AM]: Repeat soft-masking finished: Masked genome: /public/home/liuyuanchao/ceshi/test-mask_4903f361-aac1-46a8-bdb2-b407e72b501c/test.masked.fa num scaffolds: 2 assembly size: 1,216,048 bp masked repeats: 50,965 bp (4.19%)

######################################################### SUCCESS: funannotate mask test complete. #########################################################

######################################################### Running funannotate predict unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --augustus_species saccharomyces -- #########################################################

[Dec 07 09:24 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 09:24 AM]: Running funannotate v1.8.7 [Dec 07 09:24 AM]: Skipping CodingQuarry as no --rna_bam passed [Dec 07 09:24 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
genemark selftraining
glimmerhmm busco
snap busco
[Dec 07 09:24 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Dec 07 09:24 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Dec 07 09:24 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Dec 07 09:24 AM]: Found 1,784 preliminary alignments --> aligning with exonerate [Dec 07 09:24 AM]: Exonerate finished: found 1,431 alignments [Dec 07 09:24 AM]: Running GeneMark-ES on assembly [Dec 07 09:27 AM]: 1,559 predictions from GeneMark [Dec 07 09:27 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Dec 07 09:32 AM]: 370 BUSCO predictions validatednd, validating protein sequences [Dec 07 09:32 AM]: Running Augustus gene prediction using saccharomyces parameters [Dec 07 09:34 AM]: 1,489 predictions from Augustus [Dec 07 09:34 AM]: Pulling out high quality Augustus predictions [Dec 07 09:34 AM]: Found 370 high quality predictions from Augustus (>90% exon evidence) [Dec 07 09:34 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Dec 07 09:34 AM]: 2 predictions from SNAP [Dec 07 09:34 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Dec 07 09:37 AM]: 1,776 predictions from GlimmerHMM [Dec 07 09:37 AM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 1332 Augustus HiQ 2 371
GeneMark 1 1559 GlimmerHMM 1 1776 snap 1 2
Total - 5040 [Dec 07 09:37 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Dec 07 09:39 AM]: Converting to GFF3 and collecting all EVM results [Dec 07 09:39 AM]: 1,689 total gene models from EVM [Dec 07 09:39 AM]: Generating protein fasta files from 1,689 EVM models [Dec 07 09:39 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Dec 07 09:39 AM]: Found 101 gene models to remove: 0 too short; 0 span gaps; 101 transposable elements [Dec 07 09:39 AM]: 1,588 gene models remaining [Dec 07 09:39 AM]: Predicting tRNAs [Dec 07 09:39 AM]: 112 tRNAscan models are valid (non-overlapping) [Dec 07 09:39 AM]: Generating GenBank tbl annotation file [Dec 07 09:39 AM]: Converting to final Genbank format [Dec 07 09:39 AM]: Collecting final annotation files for 1,700 total gene models [Dec 07 09:39 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [Dec 07 09:39 AM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (Docker required): funannotate iprscan -i annotate -m docker -c 20

Run antiSMASH: funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i annotate --cpus 20 --sbt yourSBTfile.txt

[Dec 07 09:39 AM]: Training parameters file saved: annotate/predict_results/saccharomyces.parameters.json [Dec 07 09:39 AM]: Add species parameters to database:

funannotate species -s saccharomyces -a annotate/predict_results/saccharomyces.parameters.json

######################################################### SUCCESS: funannotate predict test complete. #########################################################

######################################################### Running funannotate predict BUSCO-mediated training unit testing CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate --cpus 20 --species Awesome busco #########################################################

[Dec 07 09:39 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 09:39 AM]: Running funannotate v1.8.7 [Dec 07 09:39 AM]: Skipping CodingQuarry as no --rna_bam passed [Dec 07 09:39 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
genemark selftraining
glimmerhmm busco
snap busco
[Dec 07 09:39 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Dec 07 09:40 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Dec 07 09:40 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Dec 07 09:40 AM]: Found 1,784 preliminary alignments --> aligning with exonerate [Dec 07 09:40 AM]: Exonerate finished: found 1,437 alignments [Dec 07 09:40 AM]: Running GeneMark-ES on assembly [Dec 07 09:42 AM]: 1,562 predictions from GeneMark [Dec 07 09:42 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Dec 07 09:46 AM]: 373 valid BUSCO predictions found, validating protein sequences [Dec 07 09:48 AM]: 370 BUSCO predictions validated [Dec 07 09:48 AM]: Training Augustus using BUSCO gene models [Dec 07 09:48 AM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 99.5% 83.8%
exons 71.8% 59.7%
genes 86.5% 59.3%
[Dec 07 09:48 AM]: Running Augustus gene prediction using awesome_busco parameters [Dec 07 09:48 AM]: 1,303 predictions from Augustus [Dec 07 09:48 AM]: Pulling out high quality Augustus predictions [Dec 07 09:48 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence) [Dec 07 09:48 AM]: Running SNAP gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Dec 07 09:49 AM]: 2 predictions from SNAP [Dec 07 09:49 AM]: Running GlimmerHMM gene prediction, using training data: annotate/predict_misc/busco.final.gff3 [Dec 07 09:52 AM]: 1,768 predictions from GlimmerHMM [Dec 07 09:52 AM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 989
Augustus HiQ 2 314
GeneMark 1 1562 GlimmerHMM 1 1768 snap 1 2
Total - 4635 [Dec 07 09:52 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Dec 07 09:53 AM]: Converting to GFF3 and collecting all EVM results [Dec 07 09:53 AM]: 1,662 total gene models from EVM [Dec 07 09:53 AM]: Generating protein fasta files from 1,662 EVM models [Dec 07 09:53 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Dec 07 09:53 AM]: Found 82 gene models to remove: 0 too short; 0 span gaps; 82 transposable elements [Dec 07 09:53 AM]: 1,580 gene models remaining [Dec 07 09:53 AM]: Predicting tRNAs [Dec 07 09:54 AM]: 112 tRNAscan models are valid (non-overlapping) [Dec 07 09:54 AM]: Generating GenBank tbl annotation file [Dec 07 09:54 AM]: Converting to final Genbank format [Dec 07 09:54 AM]: Collecting final annotation files for 1,692 total gene models [Dec 07 09:54 AM]: Funannotate predict is finished, output files are in the annotate/predict_results folder [Dec 07 09:54 AM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (Docker required): funannotate iprscan -i annotate -m docker -c 20

Run antiSMASH: funannotate remote -i annotate -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i annotate --cpus 20 --sbt yourSBTfile.txt

[Dec 07 09:54 AM]: Training parameters file saved: annotate/predict_results/awesome_busco.parameters.json [Dec 07 09:54 AM]: Add species parameters to database:

funannotate species -s awesome_busco -a annotate/predict_results/awesome_busco.parameters.json

######################################################### SUCCESS: funannotate predict BUSCO-mediated training test complete. ######################################################### Now running predict using all pre-trained ab-initio predictors CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o annotate2 --cpus 20 --species Awesome busco -p annotate/predict_results/awesome_busco.parameters.json #########################################################

[Dec 07 09:54 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 09:54 AM]: Running funannotate v1.8.7 [Dec 07 09:54 AM]: Ab initio training parameters file passed: annotate/predict_results/awesome_busco.parameters.json [Dec 07 09:54 AM]: Skipping CodingQuarry as no --rna_bam passed [Dec 07 09:54 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus pretrained
genemark pretrained
glimmerhmm pretrained
snap pretrained
[Dec 07 09:54 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Dec 07 09:54 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Dec 07 09:54 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Dec 07 09:54 AM]: Found 1,784 preliminary alignments --> aligning with exonerate [Dec 07 09:54 AM]: Exonerate finished: found 1,437 alignments [Dec 07 09:54 AM]: Running GeneMark-ES on assembly [Dec 07 09:57 AM]: 1,565 predictions from GeneMark [Dec 07 09:57 AM]: Running Augustus gene prediction using awesome_busco parameters [Dec 07 09:57 AM]: 1,303 predictions from Augustus [Dec 07 09:57 AM]: Pulling out high quality Augustus predictions [Dec 07 09:57 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence) [Dec 07 09:57 AM]: Running SNAP gene prediction, using pre-trained HMM profile [Dec 07 09:58 AM]: 2 predictions from SNAP [Dec 07 09:58 AM]: Running GlimmerHMM gene prediction, using pretrained HMM profile [Dec 07 09:58 AM]: 1,768 predictions from GlimmerHMM [Dec 07 09:58 AM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 989
Augustus HiQ 2 314
GeneMark 1 1565 GlimmerHMM 1 1768 snap 1 2
Total - 4638 [Dec 07 09:58 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Dec 07 10:00 AM]: Converting to GFF3 and collecting all EVM results [Dec 07 10:00 AM]: 1,661 total gene models from EVM [Dec 07 10:00 AM]: Generating protein fasta files from 1,661 EVM models [Dec 07 10:00 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Dec 07 10:00 AM]: Found 82 gene models to remove: 0 too short; 0 span gaps; 82 transposable elements [Dec 07 10:00 AM]: 1,579 gene models remaining [Dec 07 10:00 AM]: Predicting tRNAs [Dec 07 10:00 AM]: 112 tRNAscan models are valid (non-overlapping) CMD: funannotate train -i test.softmasked.fa --single rna-seq.illumina.fastq.gz --nanopore_mrna rna-seq.nanopore.fastq.gz -o rna-seq --cpus 20 --jaccard_clip --species Awesome rna] #########################################################

[Dec 07 10:10 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7] [Dec 07 10:10 AM]: Running 1.8.7] [Dec 07 10:10 AM]: Adapter and Quality trimming SE reads with Trimmomatic Traceback (most recent call last): File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/train.py", line 958, in main trim_single = runTrimmomaticSE(s_reads, cpus=args.cpus) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/train.py", line 58, in runTrimmomaticSE lib.Fzip_inplace(output, cpus) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 404, in Fzip_inplace
runSubprocess(cmd, '.', log) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 661, in runSubprocess proc = subprocess.Popen(cmd, cwd=dir, stdout=subprocess.PIPE, File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'pigz' ######################################################### Now running funannotate predict using RNA-seq training data CMD: funannotate predict -i test.softmasked.fa --protein_evidence protein.evidence.fasta -o rna-seq --cpus 20 --min_training_models 150 --species Awesome rna #########################################################

[Dec 07 10:11 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 10:11 AM]: Running funannotate v1.8.7 [Dec 07 10:11 AM]: Skipping CodingQuarry as no --rna_bam passed [Dec 07 10:11 AM]: Parsed training data, run ab-initio gene predictors as follows: Program Training-Method augustus busco
genemark selftraining
glimmerhmm busco
snap busco
[Dec 07 10:11 AM]: Loading genome assembly and parsing soft-masked repetitive sequences [Dec 07 10:11 AM]: Genome loaded: 6 scaffolds; 3,776,588 bp; 19.75% repeats masked [Dec 07 10:11 AM]: Mapping 1,065 proteins to genome using diamond and exonerate [Dec 07 10:11 AM]: Found 1,784 preliminary alignments --> aligning with exonerate [Dec 07 10:11 AM]: Exonerate finished: found 1,437 alignments [Dec 07 10:11 AM]: Running GeneMark-ES on assembly [Dec 07 10:13 AM]: 1,562 predictions from GeneMark [Dec 07 10:13 AM]: Running BUSCO to find conserved gene models for training ab-initio predictors [Dec 07 10:17 AM]: 373 valid BUSCO predictions found, validating protein sequences [Dec 07 10:19 AM]: 370 BUSCO predictions validated [Dec 07 10:19 AM]: Training Augustus using BUSCO gene models [Dec 07 10:19 AM]: Augustus initial training results: Feature Specificity Sensitivity nucleotides 99.5% 83.8%
exons 71.8% 59.7%
genes 86.5% 59.3%
[Dec 07 10:19 AM]: Running Augustus gene prediction using awesome_rna parameters [Dec 07 10:20 AM]: 1,303 predictions from Augustus [Dec 07 10:20 AM]: Pulling out high quality Augustus predictions [Dec 07 10:20 AM]: Found 314 high quality predictions from Augustus (>90% exon evidence) [Dec 07 10:20 AM]: Running SNAP gene prediction, using training data: rna-seq/predict_misc/busco.final.gff3 [Dec 07 10:20 AM]: 0 predictions from SNAP [Dec 07 10:20 AM]: SNAP prediction failed, moving on without result [Dec 07 10:20 AM]: Running GlimmerHMM gene prediction, using training data: rna-seq/predict_misc/busco.final.gff3 [Dec 07 10:23 AM]: 1,775 predictions from GlimmerHMM [Dec 07 10:23 AM]: Summary of gene models passed to EVM (weights): Source Weight Count Augustus 1 989
Augustus HiQ 2 314
GeneMark 1 1562 GlimmerHMM 1 1775 Total - 4640 [Dec 07 10:23 AM]: EVM: partitioning input to ~ 35 genes per partition using min 1500 bp interval [Dec 07 10:25 AM]: Converting to GFF3 and collecting all EVM results [Dec 07 10:25 AM]: 1,682 total gene models from EVM [Dec 07 10:25 AM]: Generating protein fasta files from 1,682 EVM models [Dec 07 10:25 AM]: now filtering out bad gene models (< 50 aa in length, transposable elements, etc). [Dec 07 10:25 AM]: Found 96 gene models to remove: 0 too short; 0 span gaps; 96 transposable elements [Dec 07 10:25 AM]: 1,586 gene models remaining [Dec 07 10:25 AM]: Predicting tRNAs [Dec 07 10:25 AM]: 112 tRNAscan models are valid (non-overlapping) [Dec 07 10:25 AM]: Generating GenBank tbl annotation file [Dec 07 10:25 AM]: Converting to final Genbank format [Dec 07 10:26 AM]: Collecting final annotation files for 1,698 total gene models [Dec 07 10:26 AM]: Funannotate predict is finished, output files are in the rna-seq/predict_results folder [Dec 07 10:26 AM]: Your next step might be functional annotation, suggested commands:

Run InterProScan (Docker required): funannotate iprscan -i rna-seq -m docker -c 20

Run antiSMASH: funannotate remote -i rna-seq -m antismash -e youremail@server.edu

Annotate Genome: funannotate annotate -i rna-seq --cpus 20 --sbt yourSBTfile.txt

[Dec 07 10:26 AM]: Training parameters file saved: rna-seq/predict_results/awesome_rna.parameters.json [Dec 07 10:26 AM]: Add species parameters to database:

funannotate species -s awesome_rna -a rna-seq/predict_results/awesome_rna.parameters.json

######################################################### Now running funannotate update to run PASA-mediated UTR addition and multiple transcripts CMD: funannotate update -i rna-seq --cpus 20 #########################################################

[Dec 07 10:26 AM]: OS: CentOS Linux 7, 20 cores, ~ 131 GB RAM. Python: 3.9.7 [Dec 07 10:26 AM]: Running 1.8.7 [Dec 07 10:26 AM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [Dec 07 10:26 AM]: Found relevant files in rna-seq/training, will re-use them: Single reads: rna-seq/training/single.fq.gz [Dec 07 10:26 AM]: Reannotating Awesome rna, NCBI accession: None [Dec 07 10:26 AM]: Previous annotation consists of: 1,586 protein coding gene models and 112 non-coding gene models [Dec 07 10:26 AM]: Adapter and Quality trimming SE reads with Trimmomatic Traceback (most recent call last): File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/update.py", line 2035, in main trim_single = runTrimmomaticSE(s_reads, cpus=args.cpus) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/update.py", line 414, in runTrimmomaticSE lib.Fzip_inplace(output, cpus) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 404, in Fzip_inplace runSubprocess(cmd, '.', log) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/library.py", line 661, in runSubprocess proc = subprocess.Popen(cmd, cwd=dir, stdout=subprocess.PIPE, File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/subprocess.py", line 1821, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'pigz' ######################################################### Traceback (most recent call last): File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/funannotate.py", line 705, in main mod.main(arguments) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 409, in main runRNAseqTest(args) File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 357, in runRNAseqTest assert 1630 <= countGFFgenes(os.path.join( File "/public/home/liuyuanchao/software/anaconda3/envs/funannotate/lib/python3.9/site-packages/funannotate/test.py", line 45, in countGFFgenes with open(input, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'test-rna_seq_4903f361-aac1-46a8-bdb2-b407e72b501c/rna-seq/update_results/Awesome_rna.gff3' (funannotate) [liuyuanchao@login /public/home/liuyuanchao/ceshi] $