snakemake / snakefmt

The uncompromising Snakemake code formatter
MIT License
153 stars 29 forks source link

snakefmt complains on its own formatting on multiple if statements #157

Closed weber8thomas closed 1 year ago

weber8thomas commented 2 years ago

I'm experiencing an issue on a multiple conditions if statement using snakefmt

3 if statements on a single line

Screenshot 2022-11-08 at 10 37 41

snakefmt works fine

Screenshot 2022-11-08 at 10 38 23

New formatted code looks like this

Screenshot 2022-11-08 at 10 38 54

snakefmt now complains on its own formatting

Screenshot 2022-11-08 at 10 39 14

mbhall88 commented 2 years ago

Could you paste the original code in a code fence here so I can reproduce and debug this?

corneliusroemer commented 2 years ago

I can reproduce, this is how it cycles in 0.7.0 and also in 0.6.1:

Start:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"
    # Only include rules for trigger if uploading files since the trigger
    # rules depend on the outputs from upload.
    include: "workflow/snakemake_rules/trigger.smk"

After formatting:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"

# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"

But this is still not ok per Snakefmt, if I save again:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"
# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"

@weber8thomas which version are you on?

weber8thomas commented 2 years ago

Could you paste the original code in a code fence here so I can reproduce and debug this?

@mbhall88 Sure, here is the code:

Before formatting:

if config["window"] in [50000, 100000, 200000] and config["reference"] == "hg38" and config["normalized_counts"] == True:

    rule merge_blacklist_bins:
        input:
            norm="workflow/data/normalization/HGSVC.{window}.txt",
            whitelist="workflow/data/normalization/inversion-whitelist.tsv",
        output:
            merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
        log:
            "{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
        conda:
            "../envs/mc_base.yaml"
        shell:
            """
            workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
            """

After formatting

if (
    config["window"] in [50000, 100000, 200000]
    and config["reference"] == "hg38"
    and config["normalized_counts"] == True
):

    rule merge_blacklist_bins:
        input:
            norm="workflow/data/normalization/HGSVC.{window}.txt",
            whitelist="workflow/data/normalization/inversion-whitelist.tsv",
        output:
            merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
        log:
            "{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
        conda:
            "../envs/mc_base.yaml"
        shell:
            """
            workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
            """

@corneliusroemer I also experienced the same issue regarding comments block (""") or comments lines (#)

snakefmt version used: 0.6.1

mbhall88 commented 2 years ago

As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.

Thank you very much for the bug reports!

weber8thomas commented 1 year ago

As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.

Thank you very much for the bug reports!

No urgence on my side, just wanted to report it

Good luck to @bricoletc for his PhD defence and enjoy you holidays @mbhall88 !

johanneskoester commented 1 year ago

I think #159 is quite critical indeed. I have no experience with the snakefmt codebase myself, but if you give me a pointer where to start, I can try to help here.

bricoletc commented 1 year ago

Defence went well thanks @weber8thomas !

@johanneskoester this week is very busy for me. Happy to give you a pointer in about one week - and I will also have a look at this issue and see if I can fix, though more in next two weeks.

@mbhall88 @johanneskoester I'd like to start a discussion about a possible overhaul of snakefmt. Currently the parsing code relies on a lot of heuristics - especially when trying to figure out if we are inside a block of python code or not. Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility. This issue is not the place for such a discussion btw, not sure where would be best ;)

corneliusroemer commented 1 year ago

I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.

The tests should be a useful guide along the way helping with a reimplementation.

Here are test cases for #159:

    def test_if_statement_with_snakecode_comment_snakecode_inside(self):
        """https://github.com/snakemake/snakefmt/issues/159"""
        snakecode = (
            "if True:\n\n"
            f"{TAB * 1}ruleorder: a > b\n"
            "\n"
            f"# comment\n"
            f"{TAB * 1}ruleorder: c > d\n"
        )
        formatter = setup_formatter(snakecode)
        assert formatter.get_formatted() == snakecode

    def test_if_statement_with_snakecode_2comments_snakecode_inside(self):
        """https://github.com/snakemake/snakefmt/issues/159"""
        snakecode = (
            "if True:\n\n"
            f"{TAB * 1}ruleorder: a > b\n"
            "\n"
            f"# comment\n"
            f"# comment\n"
            f"{TAB * 1}ruleorder: c > d\n"
        )
        formatter = setup_formatter(snakecode)
        assert formatter.get_formatted() == snakecode

I got the first to pass by tweaking line 66 in parser.py - but the second still fails:


            if self.vocab.recognises(keyword):
                if status.indent > self.target_indent:
-                   if self.syntax.from_python or status.pythonable:
+                   if self.from_python or status.pythonable:
                        self.from_python = True
                    else:  # Over-indented context gets reset
                        self.syntax.cur_indent = max(self.target_indent - 1, 0)
                elif self.from_python:
mbhall88 commented 1 year ago

Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility. This issue is not the place for such a discussion btw, not sure where would be best ;)

Agreed. This is what I wanted to do in the first place 😛

But that would require a significant amount of work that I don't have and I doubt @bricoletc does either. Maybe @johanneskoester has someone in his team that could do this?

Once I get #163 and #164 fixed, I will look at this. It is quite possible #164 could impact this issue

corneliusroemer commented 1 year ago

The challenge is one would need a CST instead of an AST, as the AST throws away things like comments.

So if IUC, one would need to: a) write a Snakemake grammar extension of https://docs.python.org/3/library/ast.html b) Get an AST based on that grammar c) Then implement rewrites based on that, maybe with something like https://github.com/asottile/tokenize-rt

Does that sound right @mbhall88?

bricoletc commented 1 year ago

I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.

A valiant effort @corneliusroemer !

I'd be up to having a go at a CST/AST-based rewrite, if others, like @corneliusroemer /someone from @johanneskoester 's team can join in.

(And @mbhall88 as ever if you want to - I know you were talking about ASTs when we first started :sweat_smile:)

johanneskoester commented 1 year ago

The rewrite plans sound good, but also as if that will need a lot of work. So, if we can fix this before, I think that would be good. I might be able to dedicate some person hours to something like that later in 2023, but as there are many other construction sites, it would be highly appreciated if you would want to start before.

mbhall88 commented 1 year ago

The examples in this issue are resolved in v0.8.0