nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

using the "funannotate update" to update the gff3 file which generate from EVM #773

Open Jiangjiangzhang6 opened 2 years ago

Jiangjiangzhang6 commented 2 years ago

Are you using the latest release? If you are not using the latest release of funannotate, please upgrade, if bug persists then report here.

Describe the bug A clear and concise description of what the bug is.

What command did you issue? Copy/paste the command used.

Logfiles Please provide relavent log files of the error.

OS/Install Information

Jiangjiangzhang6 commented 2 years ago

the command was “ funannotate update -f genome.fasta -g update.gff3 --species cannabis-sativa -o ~/dama/annotation/hap2/evm/update --pacbio_isoseq ../../../index/RNA/hemp_iso_seq.fasta --max_intronlen 50000 --aligners minimap2 ”

and its error


[Aug 30 10:07 PM]: OS: CentOS Linux 7, 12 cores, ~ 66 GB RAM. Python: 3.8.13 [Aug 30 10:07 PM]: Running 1.8.13 [Aug 30 10:07 PM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' Traceback (most recent call last): File "/public/home/zhaoli/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/update.py", line 1962, in main locustag, genenumber, justify = gff2pasa( File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/update.py", line 336, in gff2pasa tag, count = lastTag.split('_') ValueError: too many values to unpack (expected 2)

when run the commands " funannotate check --show-versions"


Checking dependencies for 1.8.13

You are running Python v 3.8.13. Now checking python packages... biopython: 1.79 goatools: 1.2.3 matplotlib: 3.4.1 natsort: 8.1.0 numpy: 1.22.4 pandas: 1.4.3 psutil: 5.9.1 requests: 2.28.1 scikit-learn: 0.24.1 scipy: 1.6.2 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.27 Getopt::Long: 2.5 Hash::Merge: 0.302 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000029 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/public/home/zhaoli/software/anaconda3/envs/funannotate/bin/ $PASAHOME=/public/home/zhaoli/software/anaconda3/envs/funannotate/opt/pasa-2.5.2 $TRINITY_HOME=/public/home/zhaoli/software/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/public/home/zhaoli/software/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/public/home/software/opt/bio/software/augustus/3.3.3//config/ $GENEMARK_PATH=public/home/zhaoli/software/gmes_linux_64_4/ All 6 environmental variables are set

Checking external dependencies... pigz 2.3.4 PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.2 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 emapper.py: 2.1.9 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.68_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 proteinortho: 6.1.0 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.15.1 snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: pigz not installed ERROR: signalp not installed

Jiangjiangzhang6 commented 2 years ago

and then install the "pigz" and "signalp"; and the pigz was in the dirctionary funannotate/bin. but it also show "ERROR: pigz not installed ERROR: signalp not installed"

and the re-run the commands its show the same error ,

So,dear my friend, would you help to slove this problem? thank you

nextgenusfs commented 2 years ago

Looks like your locus_tag has more than one underscore - that is not a valid format for NCBI. The error could be clearer but I think you'll need to rename your gene models.

Jiangjiangzhang6 commented 2 years ago

when I run the " funannotate annotate --rename "cannabis" --gff update.gff3 -o 11.gff3 --fasta genome.fasta -s cannabis" its down but when I run agin the "funannotate update -f genome.fasta -g 11.gff3/annotate_results/cannabis.gff3 --species cannabis-sativa -o ~/dama/annotation/hap2/evm/update --pacbio_isoseq ../../../index/RNA/hemp_iso_seq.fasta --max_intronlen 50000 --aligners minimap2"

its error shows


[Aug 31 03:07 PM]: OS: CentOS Linux 7, 12 cores, ~ 66 GB RAM. Python: 3.8.13 [Aug 31 03:07 PM]: Running 1.8.13 [Aug 31 03:07 PM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt' [Aug 31 03:08 PM]: Previous annotation consists of: 27,769 protein coding gene models and 0 non-coding gene models [Aug 31 03:08 PM]: Existing annotation: locustag=cannabis_ genenumber=27769 [Aug 31 03:08 PM]: Aligning long reads to genome with minimap2 Traceback (most recent call last): File "/public/home/zhaoli/software/anaconda3/envs/funannotate/bin/funannotate", line 10, in sys.exit(main()) File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/funannotate.py", line 716, in main mod.main(arguments) File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/update.py", line 2264, in main trinity_transcripts, cleanTranscripts = mapTranscripts( File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/update.py", line 914, in mapTranscripts mapped = longReadMap(longTuple[0], genome, isoMap, cpus=cpus, File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/update.py", line 881, in longReadMap for line in lib.execute(cmd): File "/public/home/zhaoli/software/anaconda3/envs/funannotate/lib/python3.8/site-packages/funannotate/library.py", line 555, in execute raise subprocess.CalledProcessError(return_code, cmd) subprocess.CalledProcessError: Command '['minimap2', '-x', 'splice', '-t', '2', '-G', '50000', '-uf', '-C5', '/public/home/zhaoli/dama/annotation/hap2/evm/update/update_misc/genome.fa', '/public/home/zhaoli/dama/annotation/hap2/evm/update/update_misc/iso-seq.fasta']' died with <Signals.SIGKILL: 9>.

hyphaltip commented 2 years ago

Its unclear if your system is killing the minimap run (due to resources?) is this running in a job manager HPC or on a laptop? This is inferred from the died with <Signals.SIGKILL: 9>. message

You can also try to run the minimap2 alone to see what errors you get when you run this command alone? If I got the paths correct based on your error message. Maybe there is something misformatte in the iso-seq file?

minimap2 -x splice -t 2 -G 50000 -uf -C5 /public/home/zhaoli/dama/annotation/hap2/evm/update/update_misc/genome.fa /public/home/zhaoli/dama/annotation/hap2/evm/update/update_misc/iso-seq.fasta
nextgenusfs commented 2 years ago

Running annotate to fix the locus_tags in the predict results won't work because update uses the results from predict to update the gene models. After you run update than you would run annotate on the final gene models to add functional annotation.

[Aug 31 03:08 PM]: Existing annotation: locustag=cannabis_ genenumber=27769

You need to likely re-run your predict command with a different value to --name -- don't use an underscore in your option here and one will be appended for you. So you could pass --name Cannabis and that will generate gene model locus tags like Cannabis_0000001, Cannabis_0000002. When you re-run the funannotate predict command you can also pass --keep_evm which will keep all your existing gene models, re-use all existing data, but simply rename the gene models.