michaelkyu / PlasX

PlasX, a machine learning classifier for identifying plasmid sequences based on genetic architecture
GNU General Public License v3.0
28 stars 1 forks source link

Error while running mmseqs2 #7

Open dsamoht opened 1 year ago

dsamoht commented 1 year ago

Hi,

When running the following command (either with the test data our with our own data), we obtain this error, which seems to be related to mmseqs2.

Note: the numbers of --threads and/or --splits do not seem to be related as we obtain the same error independently of these parameters. Apart from the following command, we also tried with --threads in {0, 16, 32} and --splits in {0, 16, 32}.

We also checked the conformity of the md5sum of the files downloaded via the plasx setup command.

plasx search_de_novo_families \
    -g $PREFIX-gene-calls.txt \
    -o $PREFIX-de-novo-families.txt \
    --threads $THREADS \
    --splits 32 \
    --overwrite

The error:

Database /lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/data/PlasX_mmseqs_profiles/clu90.profile needs header information
poll: 1
Deleting temporary directory: /localscratch/luhuizho.37393814.0/tmpyp01fb16
Traceback (most recent call last):
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/mmseqs.py", line 1674, in mmseqs_search
    utils.run_cmd(cmd, verbose=True)
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/utils.py", line 220, in run_cmd
    assert poll==0, 'Did not successfully run command: {}'.format(cmd)
AssertionError: Did not successfully run command: mmseqs convertalis /localscratch/luhuizho.37393814.0/tmpyp01fb16/mmseqs/source_db /lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/data/PlasX_mmseqs_profiles/clu90.profile /localscratch/luhuizho.37393814.0/tmpyp01fb16/mmseqs/clu90.search /localscratch/luhuizho.37393814.0/tmpyp01fb16/mmseqs/clu90.m8 --format-output query,target,pident,alnlen,mismatch,gapopen,qstart,qend,qlen,tstart,tend,tlen,evalue,bits

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/lustre04/scratch/luhuizho/MYENV/bin/plasx", line 8, in <module>
    sys.exit(run())
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/plasx_script.py", line 140, in run
    args.func(args)
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/plasx_script.py", line 46, in search
    clean_tmp=not args.save_tmp)
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/mmseqs.py", line 1944, in annotate_de_novo_families
    mmseqs_merge_search(mmseqs_source_db, target_db_dir, mmseqs_dir, ident_list, threads=threads, splits=splits, clean_tmp=clean_tmp)
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/mmseqs.py", line 1719, in mmseqs_merge_search
    clean_tmp=clean_tmp)
  File "/lustre04/scratch/luhuizho/MYENV/lib/python3.7/site-packages/plasx/mmseqs.py", line 1678, in mmseqs_search
    raise FileNotFoundError(f"The file {output}.m8 was supposed to be created, but it doesn't exist. This might be because the search using mmseqs2 ran out of system RAM. Consider setting the -S flag to reduce the maximum RAM usage. E.g., if you only have ~8Gb RAM, we recommend setting -S to 32 or higher.")
FileNotFoundError: The file /localscratch/luhuizho.37393814.0/tmpyp01fb16/mmseqs/clu90.m8 was supposed to be created, but it doesn't exist. This might be because the search using mmseqs2 ran out of system RAM. Consider setting the -S flag to reduce the maximum RAM usage. E.g., if you only have ~8Gb RAM, we recommend setting -S to 32 or higher.

We are using a HPC running Linux. We are not allowed to use conda on the HPC. All dependencies (except mmseqs2) were installed via the pip command inside a virtual environnement. mmseqs2 is already installed on the HPC. We are using the same version of the tutorial: mmseqs2/10-6d92c.

Thank you for your help!

JiabaoYuuuuu commented 3 months ago

I made the same mistake as you. Have you solved your problem?