nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 85 forks source link

Not enough gene models #748

Open Niohuruzh opened 2 years ago

Niohuruzh commented 2 years ago

Hi

Are you using the latest release? Funannotate v1.8.7

Describe the bug

[Jul 14 09:23 PM]: OS: Linux Mint 20.2, 48 cores, ~ 132 GB RAM. Python: 3.7.12
[Jul 14 09:23 PM]: Running funannotate v1.8.7
[Jul 14 09:23 PM]: Found training files, will re-use these files:
  --pasa_gff fun/training/funannotate_train.pasa.gff3
  --stringtie fun/training/funannotate_train.stringtie.gtf
[Jul 14 09:23 PM]: Skipping CodingQuarry as no --rna_bam passed
[Jul 14 09:23 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program      Training-Method
  augustus     pasa           
  genemark     selftraining   
  glimmerhmm   pasa           
  snap         pasa           
[Jul 14 09:23 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Jul 14 09:23 PM]: Genome loaded: 122 scaffolds; 17,430,804 bp; 2.95% repeats masked
[Jul 14 09:24 PM]: Mapping 554,696 proteins to genome using diamond and exonerate
[Jul 14 09:37 PM]: Found 241,445 preliminary alignments --> aligning with exonerate                       [Jul 14 09:46 PM]: Exonerate finished: found 1,378 alignments                                             [Jul 14 09:46 PM]: Filtering PASA data for suitable training set
[Jul 14 09:47 PM]: 198 of 6,193 models pass training parameters
[Jul 14 09:47 PM]: Not enough gene models 198 to train Augustus (200 required), exiting

What command did you issue? funannotate predict -i W3767_clean_sort_mask.fasta --strain W3767 --species W3767 --isolate W3767 -o fun --genemark_gtf genemark.gtf --busco_seed_species cryptococcus_neoformans_gattii --busco_db dikarya --cpus 40 --repeats2evm --no-evm-partitions

Logfiles Please provide relavent log files of the error.

OS/Install Information

funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.7
-------------------------------------------------------
You are running Python v 3.7.12. Now checking python packages...
biopython: 1.77
goatools: 1.2.3
matplotlib: 3.5.2
natsort: 8.1.0
numpy: 1.21.6
pandas: 1.3.5
psutil: 5.9.1
requests: 2.28.1
scikit-learn: 1.0.2
scipy: 1.7.3
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.70
DBD::mysql: 4.050
DBI: 1.643
DB_File: 1.858
Data::Dumper: 2.183
File::Basename: 2.85
File::Which: 1.27
Getopt::Long: 2.52
Hash::Merge: 0.302
JSON: 4.07
LWP::UserAgent: 6.67
Logger::Simple: 2.0
POSIX: 1.94
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.14
Tie::File: 1.06
URI::Escape: 5.12
YAML: 1.30
threads: 2.25
threads::shared: 1.61
   ERROR: Bio::Perl not installed, install with cpanm Bio::Perl 

Checking Environmental Variables...
$FUNANNOTATE_DB=/home/leigod/funannotate_db
$PASAHOME=/home/leigod/anaconda3/envs/funannotate/opt/pasa-2.4.1
$TRINITYHOME=/home/leigod/anaconda3/envs/funannotate/opt/trinity-2.8.5/
$EVM_HOME=/home/leigod/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
$GENEMARK_PATH=/home/leigod/software/gmes_linux_64_4
All 6 environmental variables are set
-------------------------------------------------------
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.0.15
emapper.py: 2.1.8
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
gmes_petap.pl: 4.69_lic
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.15-internal
kallisto: 0.46.1
mafft: v7.505 (2022/Apr/10)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
proteinortho: 6.1.0
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.15
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 39
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
    ERROR: signalp not installed

Is there troubleshooting for this Looking forward your reply

hyphaltip commented 2 years ago

did you just try and lower number of required training models with --min_training_models 190 given to predict -> (default is 200).

hyphaltip commented 2 years ago

This is a little disturbing though

[Jul 14 09:37 PM]: Found 241,445 preliminary alignments --> aligning with exonerate                      
[Jul 14 09:46 PM]: Exonerate finished: found 1,378 alignments                                             
[Jul 14 09:46 PM]: Filtering PASA data for suitable training set
[Jul 14 09:47 PM]: 198 of 6,193 models pass training parameters

Are you sure your species is close to Cryptococcus that this is working well? What is your overall BUSCO score for this genome - if it is low that might not bode well for general gene prediction you are going to be doing. or does this reflect RNAseq that isn't matching well for finding ORFs?

hyphaltip commented 1 year ago

You can set the min training number lower with the --min_training_models models parameter.

On Thu, Jul 14, 2022 at 6:48 AM Niohuruzh @.***> wrote:

Hi

Are you using the latest release? Funannotate v1.8.7

Describe the bug [07/14/22 13:04:50]: 4 models fail blast identity threshold [07/14/22 13:04:51]: 5,992 models will be ignored for training Augustus [07/14/22 13:04:52]: 199 of 6,193 models pass training parameters [07/14/22 13:04:52]: Not enough gene models 199 to train Augustus (200 required), exiting

What command did you issue? funannotate predict -i W3767_clean_sort_mask.fasta --strain W3767 --species W3767 --isolate W3767 -o fun --genemark_gtf genemark.gtf --busco_seed_species cryptococcus_neoformans_gattii --busco_db dikarya --cpus 40 --repeats2evm --no-evm-partitions

Logfiles Please provide relavent log files of the error.

OS/Install Information

funannotate check --show-versions

Checking dependencies for 1.8.7

You are running Python v 3.7.12. Now checking python packages... biopython: 1.77 goatools: 1.2.3 matplotlib: 3.5.2 natsort: 8.1.0 numpy: 1.21.6 pandas: 1.3.5 psutil: 5.9.1 requests: 2.28.1 scikit-learn: 1.0.2 scipy: 1.7.3 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.70 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.858 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.27 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.07 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 threads: 2.25 threads::shared: 1.61 ERROR: Bio::Perl not installed, install with cpanm Bio::Perl

Checking Environmental Variables... $FUNANNOTATE_DB=/home/leigod/funannotate_db $PASAHOME=/home/leigod/anaconda3/envs/funannotate/opt/pasa-2.4.1 $TRINITYHOME=/home/leigod/anaconda3/envs/funannotate/opt/trinity-2.8.5/ $EVM_HOME=/home/leigod/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config $GENEMARK_PATH=/home/leigod/software/gmes_linux_64_4 All 6 environmental variables are set

Checking external dependencies... PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 emapper.py: 2.1.8 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15gmes_petap.pl: 4.69_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.15-internal kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 proteinortho: 6.1.0 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.15 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: signalp not installed

Is there troubleshooting for this Looking forward your reply

— Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL5OZIAR5YERB5MBQJEGLVUALCTANCNFSM53SKOMQQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Sent from Gmail Mobile

Jason Stajich - @.***