Closed weber8thomas closed 1 year ago
Could you paste the original code in a code fence here so I can reproduce and debug this?
I can reproduce, this is how it cycles in 0.7.0 and also in 0.6.1:
Start:
if config.get("s3_dst"):
include: "workflow/snakemake_rules/upload.smk"
include: "workflow/snakemake_rules/trigger.smk"
# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"
After formatting:
if config.get("s3_dst"):
include: "workflow/snakemake_rules/upload.smk"
include: "workflow/snakemake_rules/trigger.smk"
# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"
But this is still not ok per Snakefmt, if I save again:
if config.get("s3_dst"):
include: "workflow/snakemake_rules/upload.smk"
include: "workflow/snakemake_rules/trigger.smk"
# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"
@weber8thomas which version are you on?
Could you paste the original code in a code fence here so I can reproduce and debug this?
@mbhall88 Sure, here is the code:
Before formatting:
if config["window"] in [50000, 100000, 200000] and config["reference"] == "hg38" and config["normalized_counts"] == True:
rule merge_blacklist_bins:
input:
norm="workflow/data/normalization/HGSVC.{window}.txt",
whitelist="workflow/data/normalization/inversion-whitelist.tsv",
output:
merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
log:
"{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
conda:
"../envs/mc_base.yaml"
shell:
"""
workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
"""
After formatting
if (
config["window"] in [50000, 100000, 200000]
and config["reference"] == "hg38"
and config["normalized_counts"] == True
):
rule merge_blacklist_bins:
input:
norm="workflow/data/normalization/HGSVC.{window}.txt",
whitelist="workflow/data/normalization/inversion-whitelist.tsv",
output:
merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
log:
"{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
conda:
"../envs/mc_base.yaml"
shell:
"""
workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
"""
@corneliusroemer I also experienced the same issue regarding comments block (""") or comments lines (#)
snakefmt version used: 0.6.1
As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.
Thank you very much for the bug reports!
As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.
Thank you very much for the bug reports!
No urgence on my side, just wanted to report it
Good luck to @bricoletc for his PhD defence and enjoy you holidays @mbhall88 !
I think #159 is quite critical indeed. I have no experience with the snakefmt codebase myself, but if you give me a pointer where to start, I can try to help here.
Defence went well thanks @weber8thomas !
@johanneskoester this week is very busy for me. Happy to give you a pointer in about one week - and I will also have a look at this issue and see if I can fix, though more in next two weeks.
@mbhall88 @johanneskoester I'd like to start a discussion about a possible overhaul of snakefmt. Currently the parsing code relies on a lot of heuristics - especially when trying to figure out if we are inside a block of python code or not. Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility. This issue is not the place for such a discussion btw, not sure where would be best ;)
I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.
The tests should be a useful guide along the way helping with a reimplementation.
Here are test cases for #159:
def test_if_statement_with_snakecode_comment_snakecode_inside(self):
"""https://github.com/snakemake/snakefmt/issues/159"""
snakecode = (
"if True:\n\n"
f"{TAB * 1}ruleorder: a > b\n"
"\n"
f"# comment\n"
f"{TAB * 1}ruleorder: c > d\n"
)
formatter = setup_formatter(snakecode)
assert formatter.get_formatted() == snakecode
def test_if_statement_with_snakecode_2comments_snakecode_inside(self):
"""https://github.com/snakemake/snakefmt/issues/159"""
snakecode = (
"if True:\n\n"
f"{TAB * 1}ruleorder: a > b\n"
"\n"
f"# comment\n"
f"# comment\n"
f"{TAB * 1}ruleorder: c > d\n"
)
formatter = setup_formatter(snakecode)
assert formatter.get_formatted() == snakecode
I got the first to pass by tweaking line 66 in parser.py - but the second still fails:
if self.vocab.recognises(keyword):
if status.indent > self.target_indent:
- if self.syntax.from_python or status.pythonable:
+ if self.from_python or status.pythonable:
self.from_python = True
else: # Over-indented context gets reset
self.syntax.cur_indent = max(self.target_indent - 1, 0)
elif self.from_python:
Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility. This issue is not the place for such a discussion btw, not sure where would be best ;)
Agreed. This is what I wanted to do in the first place 😛
But that would require a significant amount of work that I don't have and I doubt @bricoletc does either. Maybe @johanneskoester has someone in his team that could do this?
Once I get #163 and #164 fixed, I will look at this. It is quite possible #164 could impact this issue
The challenge is one would need a CST instead of an AST, as the AST throws away things like comments.
So if IUC, one would need to: a) write a Snakemake grammar extension of https://docs.python.org/3/library/ast.html b) Get an AST based on that grammar c) Then implement rewrites based on that, maybe with something like https://github.com/asottile/tokenize-rt
Does that sound right @mbhall88?
I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.
A valiant effort @corneliusroemer !
I'd be up to having a go at a CST/AST-based rewrite, if others, like @corneliusroemer /someone from @johanneskoester 's team can join in.
(And @mbhall88 as ever if you want to - I know you were talking about ASTs when we first started :sweat_smile:)
The rewrite plans sound good, but also as if that will need a lot of work. So, if we can fix this before, I think that would be good. I might be able to dedicate some person hours to something like that later in 2023, but as there are many other construction sites, it would be highly appreciated if you would want to start before.
The examples in this issue are resolved in v0.8.0
I'm experiencing an issue on a multiple conditions if statement using snakefmt
3 if statements on a single line
snakefmt works fine
New formatted code looks like this
snakefmt now complains on its own formatting