iqbal-lab-org / pling

Plasmid analysis using rearrangement distances
MIT License
29 stars 1 forks source link

Handling effects of setting containment_distance to 1 #71

Closed eri-lim closed 2 months ago

eri-lim commented 2 months ago

Hi there! Great tool - I set the containment_distance to 1 as I was trying to ensure the least amount of plasmids were filtered off, even highly dissimilar plasmids, since I wanted to see the nuances of the DCJ-Indel distance between such plasmid pairs too. However, when I did so, it resulted in a failure of the Snakemake workflow somewhere during running.

I understand in theory, a containment distance of 1 would result in no possibility of calculating DCJ-Indel distance as the number of operations to transform one to another would not be calculable.

Hence, could it be useful to perhaps provide internal handling of such problematic cases or restrict the maximum allowed containment distance value so that the pipeline would not fail completely?

Thanks so much!

babayagaofficial commented 2 months ago

Hi, can you please send me the error message you received?

babayagaofficial commented 2 months ago

Also just to note on the theory of the DCJ-Indel distance between two completely different plasmids, say plasmid A and plasmid B -- their distance would be 2, because you'd basically end up with the following integer sequence representation for the two plasmids:

A: 1
B: 2

You can go from A to B then in two operations: delete 1 from A, and then insert 2.

Basically the distance is mathematically still defined, just biologically nonsensical, hence the motivation for the containment distance threshold.

I suspect I know why the pipeline is failing in this case, but it'll be easier to pin down if you are able to send me whatever error message you received.

eri-lim commented 2 months ago

Thank you for the insight on the theory!

This is a segment of the error message in the Pling output; the message repeats itself for various batches/jobids.

[Thu Aug 15 01:38:01 2024]
Error in rule glpk_and_ding:
    jobid: 0
    input: WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv
    output: WORKING_DIR/output/tmp_files/dists_batchwise/batch_2029_dcj.tsv
    conda-env: WORKING_DIR/.snakemake/conda/e56ad01a23178879ae50d78c3c74b859_
    shell:

                PYTHONPATH=PLING_DIR python PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py                         --batch 2029                         --containment_tsv WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv                         --containment_distance 1.0                         --outputpath WORKING_DIR/output                         --communitypath WORKING_DIR/output/containment/containment_communities/objects/communities.txt                         --integerisation align                                                  --threads 1                         --snakefile_dir PLING_DIR/pling/dcj_snakemake

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message
WorkflowError:
At least one job did not complete successfully.
[Thu Aug 15 01:38:01 2024]
Error in rule glpk_and_ding:
    jobid: 2032
    input: WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv
    output: WORKING_DIR/output/tmp_files/dists_batchwise/batch_2029_dcj.tsv
    conda-env: WORKING_DIR/.snakemake/conda/e56ad01a23178879ae50d78c3c74b859_
    shell:

                PYTHONPATH=PLING_DIR python PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py                         --batch 2029                         --containment_tsv WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv                         --containment_distance 1.0                         --outputpath WORKING_DIR/output                         --communitypath WORKING_DIR/output/containment/containment_communities/objects/communities.txt                         --integerisation align                                                  --threads 1                         --snakefile_dir PLING_DIR/pling/dcj_snakemake

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Traceback (most recent call last):
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 98, in <module>
    main()
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 95, in main
    batchwise_ding(pairs, float(args.containment_distance), containments, args.integerisation, args.outputpath, args.batch, timelimit, args.snakefile_dir, plasmid_to_community)
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 52, in batchwise_ding
    unimog_to_ilp(unimog, lp, entry1, entry2)
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 15, in unimog_to_ilp
    raise e
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 10, in unimog_to_ilp
    subprocess.run(f"dingII generate {unimog} -mm --writeilp {lp} -p {genome1} {genome2}", shell=True, check=True, capture_output=True)
  File "WORKING_DIR/.snakemake/conda/e56ad01a23178879ae50d78c3c74b859_/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'dingII generate WORKING_DIR/output/unimogs/batch_2623_align.unimog -mm --writeilp ding/ilp/PLASMID1~PLASMID2.lp -p PLASMID1~PLASMID2:PLASMID1 PLASMID1~PLASMID2:PLASMID2' returned non-zero exit status 1.