Closed ys117vt closed 3 years ago
Assuming the error is from the blastp.sh step: "if notExists "$TMPPATH/pref$STEP.dbtype"; then
$RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMP_PATH/pref_$STEP" $PREFILTER_PAR -s "$SENS" \
|| fail "Prefilter died"
Anything I could try to avoid this kind of error? Thanks!
Yang
Hey @milot-mirdita @elileka
Any thoughts on this? Thanks!
Sorry for not responding. We are a bit perplexed by the bus error. It can be many things... Are there anymore details you can provide us?
Can you generate much smaller datasets (especially for the reference database) and see if they run through? You could use the databases command with UniProtKB/Swiss-Prot
for example to get a small reference db. If something smaller runs through it may indicate some resource limitation (memory, disc, etc.)
Sorry for not responding. We are a bit perplexed by the bus error. It can be many things... Are there anymore details you can provide us? Can you generate much smaller datasets (especially for the reference database) and see if they run through? You could use the databases command with
UniProtKB/Swiss-Prot
for example to get a small reference db. If something smaller runs through it may indicate some resource limitation (memory, disc, etc.)
Thank you @elileka. I did tried to use the reference database of UniProtKB/Swiss-Prot and it did work. Thank you again for your direction!
Yang
You probably should not actually use UniProtKB/Swiss-Prot
for taxonomic annotation. It small size is very convenient for testing, but it's highly biased towards the most studied organisms.
I am still quite confused how the bus error
can happen. Are you running multiple jobs on the same machine that are competing for RAM?
You probably should not actually use
UniProtKB/Swiss-Prot
for taxonomic annotation. It small size is very convenient for testing, but it's highly biased towards the most studied organisms.I am still quite confused how the
bus error
can happen. Are you running multiple jobs on the same machine that are competing for RAM?
Thanks for the reply @milot-mirdita. I actually tried the taxtocontig with the UniProtKB reference database and it died again with similar bus error. I was using external research super computer to run the code and I would assume it should have enough memory. But it looks like it's very likely that the issue is with the memory...
I am trying to run as a batch job with a specified node and see if I can pass the Index table building step. Thanks!
Do you roughly know the taxonomic group of your contigs? (Is it, for example, a sample of some algae?) If so, perhaps we could assist with constructing a leaner reference database for the exact taxonomic annotation.
Do you roughly know the taxonomic group of your contigs? (Is it, for example, a sample of some algae?) If so, perhaps we could assist with constructing a leaner reference database for the exact taxonomic annotation.
Hi @elileka, my samples are drinking water metagenomic samples. I used kraken/braken to annotate them but would like to know more about the eukaryotes in my samples as kraken/braken didn't give me much information for drinking water amoebae.
If you have a subset of contigs you are certain are eukaryotic, you could try to annotate them against a euk only reference database (or even only amoebae or any other clade, if it makes sense) this would save the resources "wasted" on the prokaryotic part of the reference database and might make the run feasible on a more limited machine. To do so, you will need to filter your taxonomic reference database as detailed here. Any valid NCBI TAXID can be used for filtering.
If you have a subset of contigs you are certain are eukaryotic, you could try to annotate them against a euk only reference database (or even only amoebae or any other clade, if it makes sense) this would save the resources "wasted" on the prokaryotic part of the reference database and might make the run feasible on a more limited machine. To do so, you will need to filter your taxonomic reference database as detailed here. Any valid NCBI TAXID can be used for filtering.
Thank you Eli! @elileka @milot-mirdita Yeah, the submitted batch job stopped for the same bus error. I will try to work with our computation service team to figure out a solution. What's the recommended memory/RAM for this kind of job? Maybe I need to apply for multiple nodes to run this. At the end, I would try to reduce the reference database and give it another try. Thank you!
Hi @milot-mirdita @elileka , I was able to run with UniprotKB reference database with extended allocation of cpu/memory with our remote computer. I think it is all good now. Thank you again for your help!
Yang
prefilter temp_tax/14471945088901788939/preds /work/cascades/.../database/tax/MMETSP_zenodo_3247846_uniclust90_2018_08_seed_valid_taxids temp_tax/14471945088901788939/tmp_taxonomy/4208998901951402961/tmp_hsp1/9481182838681733712/pref_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -k 0 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 24 --compressed 0 -v 3 -s 4.0
Query database size: 142988 type: Aminoacid Target split mode. Searching through 2 splits Estimated memory consumption: 149G Target database size: 88022300 type: Aminoacid Process prefiltering step 1 of 2
Index table k-mer threshold: 141 at k-mer size 7 Index table: counting k-mers [==============================================================temp_tax/14471945088901788939/tmp_taxonomy/4208998901951402961/tmp_hsp1/9481182838681733712/blastp.sh: line 99: 16219 Bus error $RUNNER "$MMSEQS" prefilter "$INPUT" "$TARGET" "$TMPPATH/pref$STEP" $PREFILTER_PAR -s "$SENS" Error: Prefilter died Error: First search died Error: taxonomy died
Does this related to no enough memory? Thank you!
Yang
Expected Behavior
Current Behavior
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
MetaEuk Output (for bugs)
Please make sure to also post the complete output of MetaEuk. You can use gist.github.com for large output.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.