Open JiabaoYuuuuu opened 6 months ago
Hey @JiabaoYuuuuu,
I am glad you managed to solve the download issue. We should change the setup function so it dynamically determines whether the user provided a URL online or a file path on their system for these files.
The memory issue is a weird one. You shouldn't need additional software to install -- the error is due to missing mmseqs2 files that were somehow not generated :( Are you submitting your jobs to the server via slurm? Or are you using it interactively?
Hi, meren, I submitted the task to the server. And I couldn't solve this issue through the manual installation method. So, I manually downloaded the files PlasX_mmseqs_profiles.tar.gz and PlasX_coefficients_and_gene_enrichments.txt.gz from Zenodo, then uploaded them to another website and downloaded them again (I have already deleted these files from the other website). After that, I ran: plasx search_de_novo_families \ -g $PREFIX-gene-calls.txt \ -o $PREFIX-de-novo-families.txt \ --threads $THREADS \ --splits 32 \ --overwrite This is the result I generated using the test files you provided. gene_callers_id contig start stop direction rev_compd length e_value accession 1 AST0002_000000019451 1152 1908 r True 756 0.0 mmseqs_40_33078316 1 AST0002_000000019451 1152 1908 r True 756 0.0 mmseqs_30_43406241 1 AST0002_000000019451 1152 1908 r True 756 0.0 mmseqs_25_49900063 1 AST0002_000000019451 1152 1908 r True 756 0.0 mmseqs_20_50193611 2 AST0002_000000009188 754 1807 f False 1053 0.0 mmseqs_70_18699477 2 AST0002_000000009188 754 1807 f False 1053 0.0 mmseqs_30_48665148 2 AST0002_000000009188 754 1807 f False 1053 0.0 mmseqs_30_44498373 2 AST0002_000000009188 754 1807 f False 1053 0.0 mmseqs_25_41046439 2 AST0002_000000009188 754 1807 f False 1053 4.37e-43 mmseqs_25_35867596 2 AST0002_000000009188 754 1807 f False 1053 0.0 mmseqs_20_42904105 2 AST0002_000000009188 754 1807 f False 1053 1.358e-37 mmseqs_20_38845624 It looks somewhat different from the provided template. Then, when I ran the next step, plasx predict, a new issue occurred: Loading model from /mnt/sdb/weizhonglab/yujiabao/lib/anaconda3/envs/plasx/lib/python3.10/site-packages/plasx/data/PlasX_coefficients_and_gene_enrichments.txt (11:11:11) Traceback (most recent call last): File "/mnt/sdb/weizhonglab/yujiabao/lib/anaconda3/envs/plasx/lib/python3.10/site-packages/plasx/pd_utils.py", line 1036, in read_table C = utils.unpickle(A) File "/mnt/sdb/weizhonglab/yujiabao/lib/anaconda3/envs/plasx/lib/python3.10/site-packages/plasx/compress_utils.py", line 288, in unpickle ret = blosc_decompress(path_or_buf, stream=stream, obj_type='pickle', verbose=verbose) File "/mnt/sdb/weizhonglab/yujiabao/lib/anaconda3/envs/plasx/lib/python3.10/site-packages/plasx/compress_utils.py", line 268, in blosc_decompress return pkl.loads(b"".join(arr)) EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/sdb/weizhonglab/yujiabao/lib/anaconda3/envs/plasx/bin/plasx", line 8, in
So I reran the plasx predict step using the test-contigs-de-novo-families.txt file from your test files and got the same error message. Does this mean I still haven't installed it successfully? meren, could you provide a method for manual installation? Thank you very much for your reply.
For the previous problem, I modified the mmseq.py file and changed it to: if mmseqs_profiles_url is None: mmseqs_profiles_url = 'file:///xxxx/PlasX_mmseqs_profiles.tar.gz'
if coefficients_url is None: coefficients_url = 'file:///xxx/PlasX_coefficients_and_gene_enrichments.txt.gz'
Run after plasx setup \ --de-novo-families 'file:///xxx/PlasX_mmseqs_profiles.tar.gz' \ --coefficients 'file:///xxx/PlasX_coefficients_and_gene_enrichments.txt.gz’
Then I run the next step plasx search_de_novo_families \ -g $PREFIX-gene-calls.txt \ -o $PREFIX-de-novo-families.txt \ --threads $THREADS \ --splits 32 \ --overwrite When, the error message is: FileNotFoundError: The file /tmp/tmpienmre36/mmseqs/clu90.m8 was supposed to be created, but it doesn't exist. This might be because the search using mmseqs2 ran out of system RAM. Consider setting the -S flag to reduce the maximum RAM usage. E.g., if you only have ~8Gb RAM, we recommend setting -S to 32 or higher.
My confusion is, do I need to download additional software such as diamond to generate.m8 files? My server has a lot of memory, it should not be because mmseq2 takes up too much memory.