Closed vtrinca closed 4 years ago
@vtrinca thank you for reporting this issue. Could you please attach the whole log please? Is it possible to share the input with me?
Thanks for replying log:
Tmp tmp folder does not exist or is not a directory.
Create dir tmp
dna Trinity.fasta nt.fna.taxidmapping trinity.results tmp --threads 8
MMseqs Version: 3eabfaff83bb77eac5ef342e8905cc4f7d378cb7
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace true
Alignment mode 3
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.9
Min. alignment length 100
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 1000
Compositional bias 0
Realign hits false
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Gap open cost 5
Gap extension cost 2
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 5.7
K-mer size 15
K-score 2147483647
Alphabet size 21
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring false
Exact k-mer matching 1
Mask residues 0
Mask lower case residues 0
Minimum diagonal score 25
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 2
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile e-value threshold 0.001
Use global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Omit consensus false
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1
Reverse frames 1
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Chain overlapping alignments 0
Merge query 1
Search type 0
Number search iterations 1
Start sensitivity 4
Search steps 1
Run a seq-profile search in slice mode false
Strand selection 2
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Database type 0
Shuffle input database true
Createdb mode 0
NCBI tax dump directory
Taxonomical mapping file
Blacklisted taxa 10239,12908,28384,81077,11632,340016,61964,48479,48510
Compare across kingdoms (2||2157),4751,33208,33090,(2759&&!4751&createdb Trinity.fasta tmp/2908263996980697262/sequencedb
Converting sequences
[=============
Time for merging to sequencedb_h: 0h 0m 0s 92ms
Time for merging to sequencedb: 0h 0m 0s 174ms
Database type: Nucleotide
Time for merging to sequencedb.lookup: 0h 0m 0s 0ms
Time for processing: 0h 0m 1s 450ms
Tmp tmp/2908263996980697262/createtaxdb folder does not exist or is not a directory.
Download taxdump.tar.gz
2020-05-14 09:29:01 URL:https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz [52082793/52082793] -> "-" [1]
Database created
Remove temporary files
tmp/2908263996980697262/createtaxdb/createindex.sh: line 58: [: : integer expression expected
splitsequence tmp/2908263996980697262/sequencedb tmp/2908263996980697262/db_rev_split --max-seq-len 1000 --sequence-overlap 0 --sequence-split-mode 1 --create-lookup 0 --threads 8 --compressed 1 -v 3
Sequence split mode (--sequence-split-mode 0) and compressed (--compressed 1) can not be combined.
Turn compressed to 0[=================================================================] 131.69K 0s 34ms
Time for merging to db_rev_split_h: 0h 0m 0s 56ms
Time for merging to db_rev_split: 0h 0m 0s 54ms
Time for processing: 0h 0m 0s 251ms
kmermatcher tmp/2908263996980697262/db_rev_split tmp/2908263996980697262/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 8 --compressed 0 -v 3
kmermatcher tmp/2908263996980697262/db_rev_split tmp/2908263996980697262/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size 21 --min-seq-id 0.9 --kmer-per-seq 100 --spaced-kmer-mode 1 --kmer-per-seq-scale 0 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 0 -k 24 -c 0 --max-seq-len 1000 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 8 --compressed 0 -v 3
Database size: 189836 type: Nucleotide
Generate k-mers list for 1 split
[=================================================================] 189.84K 1s 389ms
Adjusted k-mer length 24
Sort kmer 0h 0m 1s 148ms
Sort by rep. sequence 0h 0m 0s 691ms
Time for fill: 0h 0m 0s 140ms
Time for merging to pref: 0h 0m 0s 51ms
Time for processing: 0h 0m 3s 917ms
tmp/2908263996980697262/pref exists and will be overwritten.
tmp/2908263996980697262/conterminatordna.sh: line 59: 32478 Segmentation fault (core dumped) $RUNNER "$MMSEQS" crosstaxonfilterorf "$TMP_PATH/sequencedb" "$TMP_PATH/db_rev_split_h" "$TMP_PATH/pref" "$TMP_PATH/pref_cross" ${CROSSTAXONFILTERORF_PAR}
Error: crosstaxonfilterorf step died
@vtrinca Thank you! Could you please check if a subset of the input causes this error? If yes can you please attach the fasta and mapping?
Hi Martin, same error! Trinity.txt output.txt
About the mapping file, I used the same command as the README file. The file is too big for attach here. I send the first 200 lines. nt.txt
blastdbcmd -db nt -entry all -outfmt "%a %T" > nt.fna.taxidmapping
Ah, I think the issue that you need to taxonomically label your Trinity identifier. Using the mapping from the nt
database will not work. It was just an example to demonstrate how to compare the nt
database against itself.
The following command extracts for each nt
entry the sequence and taxonomical identifier.
blastdbcmd -db nt -entry all -outfmt "%a %T" > nt.fna.taxidmapping
So you need to create an own mapping file that assigns each entry to a NCBI taxonomical identifier. The mapping file should contain fasta header and taxonomical identifier. Example:
TRINITY_DN20629_c0_g1_i1 9606
TRINITY_DN20629_c0_g2_i1 562
Now that I made my own mapping file, the conterminator is working. Thank you for the attention.
Hello, I have the same error because I did the same thing as vtrinca. I am so happy to hear there is an answer to the question! However, I am unsure how to create my own mapping file. Can you please shed some light on how this is achieved?
Thanks, Aaron :)
@aaronphillips7493 could you please explain your use case?
I am trying to detect contamination (bacteria, arthropods, fungi) in a plant genome assembly that I have recently finished. Do you need more info?
Hi @aaronphillips7493,
grep ">" mygenome.fa | tr -d '>' | awk '{print $1,"taxid"}' > mygenome.fa.taxidmapping
Just replace "taxid" in the previous command with the actual taxonomic identifier number from NCBI.
cat mygenome.fa nt.fna > mydb.fa
cat mygenome.fa.taxidmapping nt.fna.taxidmapping > mydb.fa.taxidmapping
Hey, I am also running into the same error. I have the necessary input files, which I have made myself. And they work with NCBI (I have checked). The command does not work on a subset either. What can be done about this error?
The command:
conterminator dna Trinity.fasta nt.fna.taxidmapping trinity.results tmp
dies at crosstaxonfilterorf step
tmp/2908263996980697262/conterminatordna.sh: line 59: 10751 Segmentation fault (core dumped) $RUNNER "$MMSEQS" crosstaxonfilterorf "$TMP_PATH/sequencedb" "$TMP_PATH/db_rev_split_h" "$TMP_PATH/pref" "$TMP_PATH/pref_cross" ${CROSSTAXONFILTERORF_PAR} Error: crosstaxonfilterorf step died
Although the files mentioned in the output are present in the tmp/ folder