snakemake / snakefmt

The uncompromising Snakemake code formatter
MIT License
151 stars 29 forks source link

snakefmt appears to require explicit inclusion of subdirectories #202

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 1 year ago

I was surprised that snakemake files in subdirectories weren't included by default when running snakefmt .

In particular, the following directory and files were ignored when running from the project root:

root/
|-- workflow/
|   |-- snakemake_rules
|        |-- trigger_rebuild.smk
|        |-- upload.smk

I would expect that these files worfklow/snakemake_rules/trigger_rebuild.smk etc are included, because they match the default include regex .smk$

victorlin commented 1 year ago

(context: I'm working on the same repo as @corneliusroemer)

The behavior makes more sense with --verbose:

[DEBUG] Included: ingest/workflow/snakemake_rules/slack_notifications.smk matched the --include regular expression
[DEBUG] Excluded: ingest/workflow/snakemake_rules/trigger_rebuild.smk matched the --exclude regular expression
[DEBUG] Included: ingest/workflow/snakemake_rules/nextclade.smk matched the --include regular expression

This is because the default --exclude pattern checks for anything with build in the name.

We can customize on our repo to prevent this false-positive matching.

victorlin commented 1 year ago

Is there a reason why the includes are overridden by excludes? I'd prefer the opposite behavior, which would prevent this issue.

corneliusroemer commented 1 year ago

I would suggest to make the default exclude much less tight. Especially since you already respect .gitgnore. As you only match things that are explicitly called Snakefile and .smk$ by default, I don't think there's a big need to have any excludes.

However, there are definitely false positives by the very lax excludes, anything that contains the substring build is currently not formatted - quite unexpectedly really.

mbhall88 commented 1 year ago

Thanks for identifying this. The excludes are all basically supposed to be directories and we use the re library's search method, which will match is that substring exists anywhere in the string. I've made the deafults more explicit now so they should only match on directories with that name.

If you could try this out it (#204 ) would be most appreciated.