Closed zhangrengang closed 1 year ago
metaeuk run normally with other genomes, but crash with a large pine genome (Pinus tabuliformis, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA784915). Do it not support the very long chromosomes:
metaeuk
$ head busco_3011229/genome.fasta.fai chr1 2364278061 6 80 81 chr10 1752849333 2393831550 80 81 chr11 1650012615 4168591507 80 81 chr12 1392452741 5839229287 80 81 chr2 2317450362 7249087694 80 81 chr3 2291775479 9595506192 80 81 chr4 2192534405 11915928871 80 81 chr5 2148190925 14135869963 80 81 chr6 2107674557 16310913281 80 81 chr7 2082167746 18444933776 80 81
$ metaeuk easy-predict busco_3011229/genome.fasta pep.faa tmp tmpDir --max-intron 500000 --threads 16 Create directory tmpDir easy-predict busco_3011229/genome.fasta pep.faa tmp tmpDir --max-intron 500000 --threads 16 MMseqs Version: f9c166910e2ae85e1e77eaf3e22291505402c1a7 Substitution matrix nucl:nucleotide.out,aa:blosum62.out Add backtrace false Alignment mode 2 Alignment mode 0 Allow wrapped scoring false E-value threshold 100 Seq. id. threshold 0 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0 Coverage mode 0 Max sequence length 65535 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a 1 Pseudo count b 1.5 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Gap open cost nucl:5,aa:11 Gap extension cost nucl:2,aa:1 Zdrop 40 Threads 16 Compressed 0 Verbosity 3 Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out Sensitivity 4 k-mer length 0 k-score 2147483647 Alphabet size nucl:5,aa:21 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask lower case residues 0 Minimum diagonal score 15 Spaced k-mers 1 Spaced k-mer pattern Local temporary path Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.001 Global sequence weighting false Allow deletions false Filter MSA 1 Maximum seq. id. threshold 0.9 Minimum seq. id. 0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Min codons in orf 15 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 0 Search iterations 1 Start sensitivity 4 Search steps 1 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner Force restart with latest tmp false Remove temporary files false maximal combined evalue of an optimal set 0.001 minimal length ratio between combined optimal set and target 0.5 Maximal intron length 500000 Minimal intron length 15 Minimal exon length aa 11 Maximal overlap of exons 10 Gap open penalty -1 Gap extend penalty -1 allow same-strand overlaps 0 translate codons to AAs 0 write target key instead of accession 0 Reverse AA Fragments 0 createdb busco_3011229/genome.fasta tmpDir/15420076123933152342/contigs --dbtype 2 --compressed 0 -v 3 Converting sequences Time for merging to contigs_h: 0h 0m 0s 32ms Time for merging to contigs: 0h 0m 0s 0ms Database type: Nucleotide The input files have no entry: - busco_3011229/genome.fasta Please check your input files. Only files in fasta/fastq[.gz|bz2] are supported Error: contigs createdb died
Please see my comment on https://github.com/soedinglab/metaeuk/issues/77
Expected Behavior
metaeuk
run normally with other genomes, but crash with a large pine genome (Pinus tabuliformis, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA784915). Do it not support the very long chromosomes:MetaEuk Output (for bugs)