Closed jhawkey closed 5 months ago
Oh and I'll add - I'm using python v3.12 and snakemake v7.32.4
Have been meaning to call you ! Will let @babayagaofficial reply
Hi, which version of pling are you using? This ought to have been fixed in v1.0.1...
I can't seem to easily get the version - I ran:
python run_pling.py --version
Traceback (most recent call last):
File "run_pling.py", line 14, in <module>
import yaml
ModuleNotFoundError: No module named 'yaml'
But when I navigate into the pling folder and run git status
it says I'm up to date with the the main branch. So presumably I'm running v1.0.1?
Edited to add, also tried:
PYTHONPATH=/mnt/nectar/analyses/plasmid_comparison_dev/pling/ python /mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py --version
Traceback (most recent call last):
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py", line 14, in <module>
import yaml
ModuleNotFoundError: No module named 'yaml'
Kind of weird that it's erroring out without finding the yaml module, that should've installed along with snakemake (it's part of its dependencies), especially since it made it to batching step without any problems before. But anyway, the presence of the yaml module tells me that you're on the right version anyway.
Does the tmp directory with a config.yaml file exist in the pling output directory? If so, can you please send it to me?
For context, we create a config.yaml at the beginning to feed in filepaths and other inputs into snakemake later on. I've had a similar bug reported to me before, and then the problem was that extra whitespace was being added in the creation of the config file, but it was fixed when I started creating the config file through the yaml module. This looks a lot like that bug, but unfortunately it seems using the yaml module wasn't enough of a fix.
Yes, the config.yaml file exists inside a tmp_files directory, inside the output directory. This is the contents of config.yaml:
cat config.yaml
bakta_db: None
bakta_mem: 15000
bakta_threads: 1
batch_size: 50
bh_connectivity: 10
bh_neighbours_edge_density: 0.2
blocks_mem: 4000
build_DCJ_graph_mem: 8000
communities: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/containment/containment_communities
consolidation_mem: 4000
dcj_dist_mem: 4000
dcj_dist_threshold: 4
dcj_matrix_mem: 4000
deduplication_mem: 10000
deduplication_threads: 1
genomes_list: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_plasmid_inputs/small_test
get_communities_mem: 4000
identity_threshold: 80.0
ilp_mem: 10000
ilp_solver: GLPK
ilp_threads: 1
integerisation: align
length_threshold: 200
make_unimogs_mem: 10000
make_unimogs_threads: 1
metadata: None
minimap_mem: 4000
minimap_threads: 1
output_dir: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output
pairwise_seq_containment_mem: 10000
pairwise_seq_containment_threads: 1
panaroo_mem: 15000
panaroo_threads: 1
prefix: all_plasmids
seq_containment_distance: 0.5
small_subcommunity_size_threshold: 4
sourmash_mem: 10000
sourmash_threads: 1
timelimit: None
unimog_to_ilp_mem: 4000
Thank you!
So I was able to replicate the error when running on python 3.12 and snakemake 7.32.4, but when I dropped to python version 3.11 it was fine. Can you please try downgrading python to 3.11 and let me know if you still get the same error?
Ah, I just found out this is a bug in Snakemake: https://github.com/snakemake/snakemake/issues/2480
The solution is downgrading python to version 3.11, so hopefully that solves the matter and I just need to update the documentation.
Jesus christ
Hey Daria,
Thanks, downgrading to python 3.11 fixed that error.
Sadly I'm getting another one (sorry!!). This time it seems to not like the fact that the location where my fasta files are is a directory?
PYTHONPATH=/mnt/nectar/analyses/plasmid_comparison_dev/pling/ python /mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_plasmid_inputs/small_test /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output align
Batching...
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
----------- -------
all 1
get_batches 1
total 2
Select jobs to execute...
[Sat Jun 1 00:15:04 2024]
rule get_batches:
output: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/batches
jobid: 1
reason: Missing output files: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/batches
resources: tmpdir=/tmp, mem_mb=4000, mem_mib=3815
Activating conda environment: .snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_
Traceback (most recent call last):
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/batching/get_batches.py", line 107, in <module>
main()
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/batching/get_batches.py", line 87, in main
genomes, genome_index = get_labels(args.genomes_list)
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/batching/get_batches.py", line 16, in get_labels
fastafiles, fastaext, fastapath = get_fasta_file_info(filepath)
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/utils.py", line 22, in get_fasta_file_info
FASTAFILES_LIST = [el[0] for el in pd.read_csv(genomes_list, header=None).values]
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
self.handles = get_handle(
File "/mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_/lib/python3.10/site-packages/pandas/io/common.py", line 856, in get_handle
handle = open(
IsADirectoryError: [Errno 21] Is a directory: '/mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_plasmid_inputs/small_test'
[Sat Jun 1 00:15:05 2024]
Error in rule get_batches:
jobid: 1
output: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/batches
conda-env: /mnt/nectar/analyses/plasmid_comparison_dev/.snakemake/conda/50d37ffbaf41b1426e2ae3d8c4fe3997_
shell:
PYTHONPATH=/mnt/nectar/analyses/plasmid_comparison_dev/pling python /mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/batching/get_batches.py --genomes_list /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_plasmid_inputs/small_test --batch_size 50 --outputpath /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output --smash_threshold 1 --containmentpath /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/tmp_files/containment_batchwise/not_pairs_containment_distance.tsv
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-06-01T001503.452318.snakemake.log
Command 'snakemake --snakefile /mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/batching/Snakefile --configfile /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/tmp_files/config.yaml --cores 1 --use-conda --rerun-incomplete --nolock ' returned non-zero exit status 1.
Traceback (most recent call last):
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py", line 182, in <module>
main()
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py", line 179, in main
pling(args)
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py", line 130, in pling
raise e
File "/mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/run_pling.py", line 125, in pling
subprocess.run(f"snakemake --snakefile {get_pling_path()}/batching/Snakefile {snakemake_args}", shell=True, check=True, capture_output=True)
File "/mnt/nectar/conda_envs/pling/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'snakemake --snakefile /mnt/nectar/analyses/plasmid_comparison_dev/pling/pling/batching/Snakefile --configfile /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/tmp_files/config.yaml --cores 1 --use-conda --rerun-incomplete --nolock ' returned non-zero exit status 1.
The contents of my input directory looks like this:
INF344_INF344_plasmid_1.fasta INF355_INF355_plasmid_1.fasta INF361_INF361_plasmid_1.fasta
It made the output directory, and this the content of the config.yaml:
bakta_db: None
bakta_mem: 15000
bakta_threads: 1
batch_size: 50
bh_connectivity: 10
bh_neighbours_edge_density: 0.2
blocks_mem: 4000
build_DCJ_graph_mem: 8000
communities: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output/containment/containment_communities
consolidation_mem: 4000
dcj_dist_mem: 4000
dcj_dist_threshold: 4
dcj_matrix_mem: 4000
deduplication_mem: 10000
deduplication_threads: 1
genomes_list: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_plasmid_inputs/small_test
get_communities_mem: 4000
identity_threshold: 80.0
ilp_mem: 10000
ilp_solver: GLPK
ilp_threads: 1
integerisation: align
length_threshold: 200
make_unimogs_mem: 10000
make_unimogs_threads: 1
metadata: None
minimap_mem: 4000
minimap_threads: 1
output_dir: /mnt/nectar/analyses/plasmid_comparison_dev/pling_ctxm15_smallTest_output
pairwise_seq_containment_mem: 10000
pairwise_seq_containment_threads: 1
panaroo_mem: 15000
panaroo_threads: 1
prefix: all_plasmids
seq_containment_distance: 0.5
small_subcommunity_size_threshold: 4
sourmash_mem: 10000
sourmash_threads: 1
timelimit: None
unimog_to_ilp_mem: 4000
Currently running with python v3.11.6 and snakemake v7.32.4. Pling says it's v1.0.3.
If I've understood correctly, you've passed the directory path as the genomes_list
input, which isn't the right input -- Pling needs a text file with a list of paths to each individual fasta file. If you run the command
ls -d -1 $PWD/*.fasta > input.txt
and feed in the path to input.txt
for genomes_list
, it should work!
Ah, thanks! That wasn't clear to me from the docs, you may want to consider updating the readme with an example command that demonstrates that the input is a text file.
I just finished running a test set and it's worked wonderfully. Looking forward to trying it out on some data where I don't know what's going on!
Hi Daria,
I saw Zam's talk about pling and it looks amazing! I am excited to give it a try on the data we have down here in Melbourne.
Unfortunately I'm getting a snakemake error, complaining about whitespaces in my filepaths? I can't see any whitespaces, so I'm not entirely sure what's going on. I've never used snakemake before (we are nextflow users over here) and so not certain how to troubleshoot.
This is my command (no whitespaces???):
Based on the instructions, I don't think I need to provide
*.fasta
or anything, right? Just the location of the folder?Anyway, this is what pling returns as an error:
Any suggestions? I assume I'm just providing the command incorrectly/doing something dumb.