Closed xizhesun closed 1 year ago
Is this a trinity comprehensive GFF3 file? Are you sure the contig/reference GFF3 coordinates are in relation to your actual genome? If the contigs in GFF3 are not found in your genome, those will be ignored.
yes, it's a trinity comprehensive GFF3 file. I have checked it, it's the same genome. In my funannotate result by the default setting, some important gene models were lost. So I am trying to use "other_gff" to fix it.
Thanks for the reply. I wil try mydb.sqlite.pasa_assemblies.gff3 pass to the "other_gff" and try again.
The PASA assemblies GFF3 file also does not map to the genome reference, you need to use the one that has .genome. in the name, you can just look at first few entries and see if column 1 of the file are contig names from your genome assembly.
yes, the column 1 is consist with the chromosome ID.
Okay. Well it fails because those aren't gene models. They are just alignments.
Thanks for explanation in detail. So, I need to use the gff file from the "pasa_asmbls_to_training_set.dbi". But after the "training_set" step, the "mydb.sqlite.assemblies.fasta.transdecoder.cds" has lost some import genes. But these genes are totally in the Trinity genome guided assembly. Is there a method to convert the Trinity genome-guided assembly to gene models without training? because I'm sure these important genes are all in the Trinity assembly. But no initio tools could annotate all these important genes. These important genes are pathogenic related genes, maybe some of thm are horizontal transfer gene.
I checked it again. The "Trinity-GG.fasta" contains all the important genes. After PASA step, the "mydb.sqlite.assemblies.fasta" also contains all the important genes. But some of the important genes of PASA gene models will be filted out by transdecoder! I really don't know how to deal with it. Is there any other recommended method which is more based on the original transcript evidence without filter?
FInally, I solved the problem by using Transdecoder without the filter. thank you! I will close this issue.
Are you using the latest release? funannotate v1.8.13
Describe the bug I have set "-other_gff compreh_init_build.gff3:10" It was recognised by funannotate, was set source to other_pred1. But other_pred1 was not showed in the gene models which passed to EVM.
What command did you issue? funannotate predict --name FOL007 -i Fol007.final.repeatmasker.fasta -o Fol007_anotation_Trinity-GG_compreh --pasa_gff mydb.sqlite.assemblies.fasta.transdecoder.genome.gff3 --transcript_evidence Trinity-GG.fasta --rna_bam fo.sort.merged.bam --cpus 128 -s "fusarium oxysporum" --strain Fol007 --other_gff compreh_init_build.gff3:10 --busco_db sordariomycetes
Logfiles Please provide relavent log files of the error.
output of
funannotate check --show-versions
\Checking dependencies for 1.8.13You are running Python v 3.8.13. Now checking python packages... biopython: 1.79 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.23.2 pandas: 1.4.4 psutil: 5.9.2 requests: 2.28.1 scikit-learn: 1.1.2 scipy: 1.9.1 seaborn: 0.12.0 All 11 python packages installed
You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.70 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.09 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 ERROR: DBD::mysql not installed, install with cpanm DBD::mysql
Checking Environmental Variables... $FUNANNOTATE_DB=/home/data2/mals/funannotate $PASAHOME=/home/data2/mals/anaconda3/envs/funannotate/opt/pasa-2.5.2 $TRINITY_HOME=/home/data2/mals/anaconda3/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/home/data2/mals/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/home/data2/mals/anaconda3/envs/funannotate/config/ $GENEMARK_PATH=/home/data2/mals/tools/GeneMark-ET/gmes_linux_64 All 6 environmental variables are set
Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.4.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 emapper.py: 2.1.9 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.69_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.13 kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.1.0 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.15.1 signalp: 5.0b snap: 2006-07-28 stringtie: 2.1.7 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 37 external dependencies are installed