iqbal-lab-org / pling

Plasmid analysis using rearrangement distances
MIT License
28 stars 1 forks source link

Snakemake? error? regarding batch file #74

Open karinlag opened 2 weeks ago

karinlag commented 2 weeks ago

Hi!

I am grying out this tool now. I installed via conda, and am using it on a slurm run cluster with srun. Have asked for 10 cpus. I have a file list containing 8 plasmid fasta seqs. The plasmids are hybrid assemblied. The error I get is:


(pling) [karinlag@c2-26.SAGA /cluster/projects/nn9305k/active/karinlag/2024-iconic]$ pling filelist.txt testout align --cores 10 --sourmash --batch_size 2 Batching...

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 10 Rules claiming more threads will be scaled down. Job stats: job count


all 1 get_batches 1 total 2

Select jobs to execute...

[Tue Aug 27 17:42:14 2024] rule get_batches: output: testout/batches jobid: 1 reason: Missing output files: testout/batches resources: tmpdir=/tmp, mem_mb=10000, mem_mib=9537

Traceback (most recent call last): File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 103, in main() File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 89, in main run_smash(args.genomes_list, sig_path, matrixpath) File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 63, in run_smash raise e File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py", line 59, in run_smash subprocess.run(f"sourmash sketch dna --from-file {genome_list} -o {sig_path}", shell=True, check=True, capture_output=True) File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'sourmash sketch dna --from-file filelist.txt -o testout/sourmash/all_plasmids.sig' returned non-zero exit status 1. [Tue Aug 27 17:42:24 2024] Error in rule get_batches: jobid: 1 output: testout/batches shell:

    PYTHONPATH=/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages python /cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/get_batches.py             --genomes_list filelist.txt             --batch_size 2             --outputpath testout             --sourmash             --smash_threshold 0.85             --containmentpath testout/containment/not_pairs_containment_distance.tsv

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-08-27T174211.327052.snakemake.log

Command 'snakemake --snakefile /cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/Snakefile --configfile testout/tmp_files/config.yaml --cores 10 --rerun-incomplete --nolock ' returned non-zero exit status 1. Traceback (most recent call last): File "/cluster/projects/nn9305k/src/miniconda/envs/pling/bin/pling", line 10, in sys.exit(main()) ^^^^^^ File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/run_pling.py", line 183, in main pling(args) File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/run_pling.py", line 141, in pling raise e File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/run_pling.py", line 136, in pling subprocess.run(f"snakemake --snakefile {get_pling_path()}/batching/Snakefile {snakemake_args}", shell=True, check=True, capture_output=True) File "/cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'snakemake --snakefile /cluster/projects/nn9305k/src/miniconda/envs/pling/lib/python3.11/site-packages/pling/batching/Snakefile --configfile testout/tmp_files/config.yaml --cores 10 --rerun-incomplete --nolock ' returned non-zero exit status 1. (pling) [karinlag@c2-26.SAGA /cluster/projects/nn9305k/active/karinlag/2024-iconic]$

I am running your test dataset in the same manner, and that has not failed (so far, is still running, but got past batching).

Any idea what is wrong? I am not very familiar with snakemake, so sorry if I am making obvious mistakes of one kind or another.

karinlag commented 2 weeks ago

I am a bit hesitant to download stuff onto the cluster from the internet. Can you tell me what this is?

Also, impressive response time!

iqbal-lab commented 2 weeks ago

closing this issue for now until Daria and i can talkl

babayagaofficial commented 2 weeks ago

Hi Karin!

In your file list, are the paths relative or absolute? It looks like pling is erroring out when trying to run sourmash, and I vaguely remember it being finnicky about file paths at some point.

If it's not that, can you please either run the command sourmash sketch dna --from-file filelist.txt -o testout/sourmash/all_plasmids.sig and tell me what happens, or send me the plasmids you're testing on so I can try debug?

karinlag commented 2 weeks ago

The files in the filelist have absolute paths, and I can ls them on the command line.

FYI I have run through your example file set and that ran without issues.

Sourmash ran well. Here is the output:

(pling) [karinlag@c2-47.SAGA /cluster/projects/nn9305k/active/karinlag/2024-iconic/pling]$ less testout/sourmash/all_plasmids.sig (pling) [karinlag@c2-47.SAGA /cluster/projects/nn9305k/active/karinlag/2024-iconic/pling]$ cat testout/sourmash/all_plasmids.sig {"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2011-01-3991-4_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"38340f5022f1b4b06235b3706677d0cd","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2011-01-4277-6_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"8fd92e18f88a3c353e3858fb27d03f39","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2002-01-856_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1876382433387916,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"6d6e6e9efbf88bb71c6322a0331b1cab","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2013-01-3776_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5321635288652507,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8913670508770807,9922754642423950,10141409509109256,10604418598382448,10685131983240125,11297382768904819,11619965257402279,12775684893959947,13071799269209422,13422641720227020,13561309383030245,14468452007203572,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17585142582113419,17795910564485931,17876852118436816,18060306135304036,18081646073001083],"md5sum":"c7cdbd12a571c07668a30de1d3c952f8","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2004-01-295_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,410160685533855,1280411105032480,1548659820748513,1600759877581307,1809502862701755,3338939268924438,3625589219571383,4260766948629996,4948886978307551,5321635288652507,5775701240304176,7250362828267870,8405070370237353,10141409509109256,10685131983240125,11215808376261839,11297382768904819,11619965257402279,12495754348681416,12775684893959947,13071799269209422,13422641720227020,13561309383030245,14086504856082619,14468452007203572,14502510590605305,16722650050088156,17223053851400242,17365235757707201,17585142582113419,17876852118436816,18260973711158596],"md5sum":"e005938c091674fb429acf735c3b1df5","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2002-01-750_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4121414412210452,4260766948629996,5321635288652507,5775701240304176,6759409368061580,7250362828267870,8209140697418008,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,10685131983240125,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13422641720227020,13561309383030245,14468452007203572,16119940313576173,17223053851400242,17365235757707201,17585142582113419,17795910564485931,17876852118436816,18081646073001083],"md5sum":"31884faa66759a5348f80e1a5d47a791","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2004-01-1570_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1600759877581307,1809502862701755,1908644393763148,2637738332492691,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8209140697418008,8405070370237353,8410686004908645,8913670508770807,10141409509109256,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13422641720227020,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"c98ad460af789c431d457066fba694b0","molecule":"DNA"}],"version":0.4},{"class":"sourmash_signature","email":"","hash_function":"0.murmur64","filename":"/cluster/projects/nn9305k/active/rikkiff/231218_hybrid_assembly_mob_suite/2009-01-1808-4_filtered.fasta/plasmid_AB172.fasta","license":"CC0","signatures":[{"num":0,"ksize":31,"seed":42,"max_hash":18446744073709552,"mins":[28120151681885,1280411105032480,1548659820748513,1600759877581307,1809502862701755,1908644393763148,2637738332492691,2793697803061775,3338939268924438,4260766948629996,5775701240304176,6759409368061580,7250362828267870,8243783598597327,8405070370237353,8410686004908645,8913670508770807,9922754642423950,10141409509109256,10604418598382448,11297382768904819,11619965257402279,12495754348681416,12497891601490898,12775684893959947,13071799269209422,13422641720227020,14502510590605305,16119940313576173,16722650050088156,17223053851400242,17365235757707201,17795910564485931,17876852118436816,18081646073001083],"md5sum":"38340f5022f1b4b06235b3706677d0cd","molecule":"DNA"}],"version":0.4}

Hope some of it makes sense for you!

babayagaofficial commented 2 weeks ago

That's pretty weird, looking at your error message it seems almost certain that pling errored out while trying to run sourmash, but then it's fine when you do it separate -- I'm very sorry about the pfaff, but I'm afraid I'm going to need some more help/information from you to debug this. Did you run sourmash in the same environment as pling? If not, can you check with which version you ran it? Can you also please check which version is installed in your pling environment?

Also, can you try running on the 8 plasmids without the sourmash flag, and let me know if those run okay?

If it's alright to share, is there any chance you can send me the fasta files for your 8 plasmids?