Closed nick-youngblut closed 5 years ago
To be clear, as far as I can tell, mmseqs taxonomy
is completely unusable due to this bug. I'm surprised others have not commented on this earlier. I've reproduced this error multiple times, so it's not stochastic.
@nick-youngblut I have added a taxonomy regression test to our test suite. I could not reproduce your error. But we found a critical error, caused by multi threading ,in one modules involved in the 2bLCA search. This issues should be fixed in the main branch. Could you try to run the regression?
git clone https://bitbucket.org/martin_steinegger/mmseqs-benchmark
cd mmseqs-benchmark
./run_regression.sh mmseqs resultFolder
@martin-steinegger sorry for the delay. I ran the regression (usingmmseqs2 8.fac81 hf3e9acd_1 bioconda
), and it appears that some tests failed. The end of the test output:
Tmp resultFolder/LINSEARCH_NUCLNUCL_TARNS_SEARCH/tmp folder does not exist or is not a directory.
Created dir resultFolder/LINSEARCH_NUCLNUCL_TARNS_SEARCH/tmp
Program call:
extractorfs resultFolder/LINSEARCH_NUCLNUCL_TARNS_SEARCH/targetannotation_nucl resultFolder/LINSEARCH_NUCLNUCL_TARNS_SEARCH/tmp/4434917762398107271/orfs --min-length 30 --max-length 98202 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --use-all-table-starts 0 --id-offset 0 --threads 80 --compressed 0 -v 3
No datafile could be found for resultFolder/LINSEARCH_NUCLNUCL_TARNS_SEARCH/targetannotation_nucl_h!
Error: extractorfs died
Command exited with non-zero status 1
40.25user 1.33system 0:02.64elapsed 1570%CPU (0avgtext+0avgdata 178744maxresident)k
154744inputs+244552outputs (605major+33470minor)pagefaults 0swaps
LINSEARCH_NUCLNUCL_TARNS_SEARCH
TEST FAILED (NO REPORT)
DBPROFILE_INDEX
TEST FAILED (NO REPORT)
NUCLPROTTAX_SEARCH
TEST FAILED (NO REPORT)
PROTNUCL_SEARCH
TEST FAILED (NO REPORT)
EASY_LINCLUST
TEST SUCCESS
GOOD
Expected: 26523
Actual: 26523
LINCLUST
TEST SUCCESS
GOOD
Expected: 26523
Actual: 26523
EASY_CLUSTER
TEST SUCCESS
GOOD
Expected: 15682
Actual: 15682
CLUSTER
TEST SUCCESS
GOOD
Expected: 15682
Actual: 15682
NUCLNUCL_TRANS_SEARCH
TEST FAILED (NO REPORT)
NUCLNUCL_SEARCH
TEST FAILED (NO REPORT)
NUCLPROT_SEARCH
TEST FAILED (NO REPORT)
DBPROFILE
TEST SUCCESS
GOOD
Expected: 0.142
Actual: 0.182019
SLICEPROFILE
TEST SUCCESS
GOOD
Expected: 0.140
Actual: 0.147729
EASY_PROFILE
TEST SUCCESS
GOOD
Expected: 0.334
Actual: 0.338768
PROFILE
TEST FAILED
BAD
Expected: 0.367
Actual: 0.324652
EASY_SEARCH
TEST SUCCESS
GOOD
Expected: 0.235
Actual: 0.238355
SEARCH
TEST SUCCESS
GOOD
Expected: 0.235
Actual: 0.238355
Ah yes, the bioconda version has some known issues. We added quite a lot of testing this recent days and fixed many issues. Could you please try the most recent version? We will make a new release soon.
OK, I cloned from the master branch (MMseqs2 Version: d990a0fb4bba9193b8aadc699a614303a57792f2) and re-ran the tests. During the testing, the following warning/error kept appearing: No datafile could be found for resultFolder/NUCLPROTTAX_SEARCH/query_nucl_h!
. Here's the tail of the output:
No datafile could be found for resultFolder/LINSEARCH_NUCLNUCL_TARNS_SEARCH/targetannotation_nucl_h!
Error: extractorfs died
Command exited with non-zero status 1
37.62user 1.04system 0:02.30elapsed 1676%CPU (0avgtext+0avgdata 57204maxresident)k
156904inputs+244464outputs (603major+36363minor)pagefaults 0swaps
LINSEARCH_NUCLNUCL_TARNS_SEARCH
TEST FAILED (NO REPORT)
DBPROFILE_INDEX
TEST SUCCESS
GOOD
Expected: 0.142
Actual: 0.197554
NUCLPROTTAX_SEARCH
TEST FAILED (NO REPORT)
PROTNUCL_SEARCH
TEST FAILED (NO REPORT)
EASY_LINCLUST
TEST SUCCESS
GOOD
Expected: 26523
Actual: 26523
LINCLUST
TEST SUCCESS
GOOD
Expected: 26523
Actual: 26523
EASY_CLUSTER
TEST FAILED
BAD
Expected: 15682
Actual: 15675
CLUSTER
TEST FAILED
BAD
Expected: 15682
Actual: 15675
NUCLNUCL_TRANS_SEARCH
TEST FAILED (NO REPORT)
NUCLNUCL_SEARCH
TEST FAILED (NO REPORT)
NUCLPROT_SEARCH
TEST FAILED (NO REPORT)
DBPROFILE
TEST SUCCESS
GOOD
Expected: 0.142
Actual: 0.182019
SLICEPROFILE
TEST SUCCESS
GOOD
Expected: 0.140
Actual: 0.147729
EASY_PROFILE
TEST SUCCESS
GOOD
Expected: 0.334
Actual: 0.338757
PROFILE
TEST SUCCESS
GOOD
Expected: 0.367
Actual: 0.367423
EASY_SEARCH
TEST SUCCESS
GOOD
Expected: 0.235
Actual: 0.238355
SEARCH
TEST SUCCESS
GOOD
Expected: 0.235
Actual: 0.238355
@nick-youngblut do you still encounter this self directed sym links?
@martin-steinegger I haven't encountered the problems anytime recently, but I also haven't used mmseqs2 much recently. I am planning using it more soon, so I can let you know. Is mmseqs2 updated on bioconda?
When I run
mmseqs taxonomy
, it converts the _h file for the input sequence db from a standard file to a symlink that points at itself. So the symlink is then broken, andmmseqs taxonomy
fails. I'm using a different temporary directory formmseqs taxonomy
than where the _h file is, so that shouldn't be the problem.mmseqs version: 8.fac81
conda env
``` # Name Version Build Channel bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2019.3.9 hecc5488_0 conda-forge curl 7.64.1 hf8cf82a_0 conda-forge gawk 4.2.1 h14c3975_1001 conda-forge krb5 1.16.3 h05b26f9_1001 conda-forge libcurl 7.64.1 hda55be3_0 conda-forge libdeflate 1.0 h14c3975_1 bioconda libedit 3.1.20170329 hf8c457e_1001 conda-forge libgcc-ng 8.2.0 hdf63c60_1 libssh2 1.8.2 h22169c7_2 conda-forge libstdcxx-ng 8.2.0 hdf63c60_1 llvm-openmp 8.0.0 hc9558a2_0 conda-forge mmseqs2 8.fac81 hf3e9acd_1 bioconda ncurses 6.1 hf484d3e_1002 conda-forge openmp 8.0.0 0 conda-forge openssl 1.1.1b h14c3975_1 conda-forge pigz 2.3.4 0 conda-forge plass 2.c7e35 h21aa3a5_1 bioconda samtools 1.9 h8571acd_11 bioconda seqtk 1.3 h84994c4_1 bioconda tk 8.6.9 h84994c4_1001 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.11 h14c3975_1004 conda-forge ```
conda info
``` active environment : /ebio/abt3_projects/software/dev/llmgag/.snakemake/conda/6345f887 active env location : /ebio/abt3_projects/software/dev/llmgag/.snakemake/conda/6345f887 shell level : 2 user config file : /ebio/abt3/nyoungblut/.condarc populated config files : /ebio/abt3_projects/software/dev/miniconda3_dev/.condarc /ebio/abt3/nyoungblut/.condarc conda version : 4.6.11 conda-build version : 3.11.0 python version : 3.6.7.final.0 base environment : /ebio/abt3_projects/software/dev/miniconda3_dev (writable) channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://conda.anaconda.org/bioconda/linux-64 https://conda.anaconda.org/bioconda/noarch https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/free/linux-64 https://repo.anaconda.com/pkgs/free/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch https://conda.anaconda.org/leylabmpi/linux-64 https://conda.anaconda.org/leylabmpi/noarch https://conda.anaconda.org/r/linux-64 https://conda.anaconda.org/r/noarch https://conda.anaconda.org/qiime2/linux-64 https://conda.anaconda.org/qiime2/noarch package cache : /ebio/abt3_projects/software/dev/miniconda3_dev/pkgs /ebio/abt3/nyoungblut/.conda/pkgs envs directories : /ebio/abt3_projects/software/dev/miniconda3_dev/envs /ebio/abt3/nyoungblut/.conda/envs platform : linux-64 user-agent : conda/4.6.11 requests/2.18.4 CPython/3.6.7 Linux/4.9.127 ubuntu/18.04.1 glibc/2.27 UID:GID : 6354:350 netrc file : None offline mode : False ```