Closed songtaogui closed 4 years ago
Haven't seen this one before. Can you try to figure out why minimap2 died? i.e. this is what it is running:
minimap2 -ax splice -t 90 --cs -u b -G 60000 \
PANZ_funannotate_predict/predict_misc/genome.softmasked.fa \
PANZ_funannotate_predict/predict_misc/transcripts.combined.fa \
| samtools sort -@4 -o transcript_alignments.bam -
So is it dying because of the 90 threads? Or are you running out of memory with the index because of the large assembly? Seems like perhaps the latter, since you manually ran --split-prefix. What did you run for the indexing step then?
You can pass GFF3 transcript alignments to the --transcript_alignments
option. To convert from minimap2 BAM file (note you must run with the --cs
flag to GFF3, you can use funannotate util bam2gff3
script.
Funanntoate will re-use any existing data if you give it the same output command. I don't think it will re-use the repeat identification step, but I can look into that. It is multi-threaded.
You might want to consider cleaning your assembly prior to running annotation, ~3 million contigs is kind of crazy. I understand maize is a large genome, but thought it was more like 2.5GB, here you have 5.3GB. You likely won't get any gene annotation in contigs less than 10 kb in size. One of the reasons the repeat detection is taking so long is the 3 million contigs....
@nextgenusfs
Thank you for your help.
I am now trying to clean my inputs and rerun it with a larger memory.
I have noticed that there were two gff files for transcript alignment in the predict_misc
dir: transcript_minimap2.gff3
and transcript_alignments.gff3
, is there any manipulation before converting transcripts.minimap2.bam
to transcript_alignments.gff3
?
You might want to consider cleaning your assembly prior to running annotation, ~3 million contigs is kind of crazy. I understand maize is a large genome, but thought it was more like 2.5GB, here you have 5.3GB.
What I am trying to annotate are a bunch of non-reference sequences, that's why they are so fragmentary. And yes, I was planning to filter out short sequences and repeat-rich sequences prior to annotation.
You likely won't get any gene annotation in contigs less than 10 kb in size.
Do you have any suggestions on the appropriate options in annotating short contigs, because a large portion of my input sequences were less than 10Kb.
Thank you again for your kindly help.
Best wishes,
Songtao Gui
Depending on the settings you run it with, there may/may not be a difference between those two GFF3 files -- one is minimap2 alignments, however you can also have it run gmap/blat alignments as well. The transcript_alignments.gff3
is the combined results that are eventually passed to EvidenceModeler.
What is the average maize gene length? Basically the gene predictors need some context to predict genes, if you are using pre-trained Augustus parameters, you could literally just run Augustus on these contigs. But they aren't likely to be very informative. The goal of funannotate is to generate NCBI submission ready annotated genomes, ie you should be feeding it as input a cleaned up ready-to-publish genome assembly. It isn't designed to annotate short contigs/fragments ie such as a meta genome. What is your goal in trying to annotate short repetitive contigs?
Are you using the latest release? Yes, I am using version: 1.5.3-21ad095
Describe the bug minimap2 failed during aligning
transcript.fa
to the genome with logs below.What command did you issue? funannotate predict
Logfiles
OS/Install Information
funannotate check --show-versions
You are running Perl v 5.028001. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.50 Clone: 0.41 DBD::SQLite: 1.62 DBI: 1.642 DB_File: 1.84 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.84 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.08 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.22 threads::shared: 1.59 ERROR: DBD::mysql not installed, install with cpanm DBD::mysql
Checking external dependencies... Traceback (most recent call last): File "/public/home/stgui/.linuxbrew/bin/ete3", line 6, in
from ete3.tools.ete import main
File "/public/home/stgui/.linuxbrew/opt/python/lib/python3.7/site-packages/ete3/tools/ete.py", line 55, in
from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view,
File "/public/home/stgui/.linuxbrew/opt/python/lib/python3.7/site-packages/ete3/tools/ete_view.py", line 48, in
from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace,
ImportError: cannot import name 'TextFace' from 'ete3' (/public/home/stgui/.linuxbrew/opt/python/lib/python3.7/site-packages/ete3/init.py)
CodingQuarry: 2.0
RepeatMasker: RepeatMasker 4.0.9
RepeatModeler: RepeatModeler 1.0.8
Trinity: 2.8.3
augustus: 3.3.2
bamtools: bamtools 2.5.1
bedtools: bedtools v2.27.1
blat: BLAT v36
diamond: diamond 0.8.22
emapper.py: emapper-1.0.3
exonerate: exonerate 2.2.0
fasta: no way to determine
gmap: 2015-09-29
gmes_petap.pl: 4.38
hisat2: 2.1.0
hmmscan: HMMER 3.1b2 (February 2015)
hmmsearch: HMMER 3.1b2 (February 2015)
java: 1.8.0_181-ojdkbuild
kallisto: 0.44.0
mafft: v7.407 (2018/Jul/23)
makeblastdb: makeblastdb 2.9.0+
minimap2: 2.17-r941
nucmer: 3.1
pslCDnaFilter: no way to determine
rmblastn: rmblastn 2.9.0+
samtools: samtools 1.9
stringtie: 1.3.4d
tRNAscan-SE: 2.0 (December 2017)
tbl2asn: unknown, likely 25.3
tblastn: tblastn 2.9.0+
trimal: trimAl v1.4.rev15 build[2013-12-17]
ERROR: ete3 not installed
Checking Environmental Variables...
$FUNANNOTATE_DB=/public/home/stgui/work/funannotateDB
$PASAHOME=/public/home/stgui/.linuxbrew/Cellar/PASApipeline-v2.3.3
$TRINITYHOME=/public/home/stgui/.linuxbrew/Cellar/trinity/2.8.3
$EVM_HOME=/public/home/stgui/.linuxbrew/Cellar/evidencemodeler/0.1.3
$AUGUSTUS_CONFIG_PATH=/public/home/stgui/.linuxbrew/Cellar/augustus/3.3.2/config
$GENEMARK_PATH=/public/home/stgui/.linuxbrew/Cellar/gm_et_linux_64/gmes_petap
$BAMTOOLS_PATH=/public/home/stgui/.linuxbrew/Cellar/bamtools/2.5.1/bin
All 7 environmental variables are set
/public/home/stgui/.linuxbrew/Cellar/funannotate/util/sam2bam.sh "minimap2 -ax splice -t 36 --split-prefix ./tmp_split_prefixt -c -u b -G 60000 /public/home/stgui/work/PANZ_funannotate/PANZ_funannotate_predict/predict_misc/genome.softmasked.fa /public/home/stgui/work/PANZ_funannotate/PANZ_funannotate_predict/predict_misc/transcripts.combined.fa" 36 ./transcripts.minimap2.bam 1> logs.txt 2>&1