snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.27k stars 553 forks source link

`module:` files in the `resources/` folder are not delivered alongside a remote module called via `github()` #3025

Closed irm-codebase closed 2 months ago

irm-codebase commented 2 months ago

This bug is based on the following documentation text about modules:

Snakemake version 8.18

Describe the bug

Files in the resources/ folder are not adequately delivered alongside remote modules called using the github() function.

Following the recommended file structure, assume I call a file in resources/ within my workflow.

├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│   ├── rules
|   │   ├── module1.smk
|   │   └── module2.smk
│   ├── envs
|   │   ├── tool1.yaml
|   │   └── tool2.yaml
│   ├── scripts
|   │   ├── script1.py
|   │   └── script2.R
│   ├── notebooks
|   │   ├── notebook1.py.ipynb
|   │   └── notebook2.r.ipynb
│   ├── report
|   │   ├── plot1.rst
|   │   └── plot2.rst
|   └── Snakefile
├── config
│   ├── config.yaml
│   └── some-sheet.tsv
├── results
└── resources          <<------------------------------------- The culprit of this issue!
rule create_controlled_road_transport_annual_demand_and_installed_capacities:
    message: "Create annual demand for controlled charging and corresponding charging potentials at a given resolution"
    input:
        # some inputs above...
        populations = "resources/population/population_national.csv"  # FIXME: I do not work :(
    params:
        # some params...
    conda: "../envs/default.yaml"
    output:
        main = "results/electrified-transport.csv",
    script: "../scripts/road_transport_controlled_charging.py"

I am interpreting the documentation as "you should be able to read small files located in the resources/ folder" when importing a workflow as a module:.

This currently is not possible, as trying get the file located in the resources/ folder of this workflow will fail if you call this workflow through github(). Basically, snakemake does not deliver these files via git, meaning either the documentation should be made clearer, or there is a bug in the code.

Logs

The following is returned. Essentially, the file does not exist.

Building DAG of jobs...
MissingInputException in rule module_transport_create_controlled_road_transport_annual_demand_and_installed_capacities in file https://raw.githubusercontent.com/calliope-project/ec_modules/feature-transport-fixes/modules/transport_road/workflow/rules/transport.smk, line 21:
Missing input files for rule module_transport_create_controlled_road_transport_annual_demand_and_installed_capacities:
    output: module-transport/results/electrified-transport.csv
    affected files:
        module-transport/resources/population/population_national.csv

Minimal example Create a Snakefile with the following:

module my_failing_module:
    snakefile:
        github(
        "calliope-project/ec_modules", path="modules/transport_road/workflow/Snakefile", branch="feature-transport-fixes"
        )
    prefix: "foobar"

use rule * from my_failing_module as foobar_*

Then attempt snakemake --use-conda -c 1 module-transport/results/electrified-transport.csv

Please use python>= 3.12, as there are other issues with 3.11 in snakemake (unrelated to this issue)

Additional context

I just want my workflow to work :')

irm-codebase commented 2 months ago

I'm going to close this issue, since it was caused by not understanding how snakemake handles remote files (although admittedly it's quite hard to find this in the documentation). The docs are in need of some heavy rework.

https://snakemake.readthedocs.io/en/stable/project_info/faq.html#how-does-snakemake-interpret-relative-paths

Basically, you need to use the workflow.source_path("resources/some-file.txt") command, which will look for files relative to the current .smk file you are writing in.

snakemake developers recommended to place small resource files that your workflow will need on this location:

.
├── config
├── LICENSE
├── README.md
├── results
└── workflow
    ├── envs
    ├── profiles
    ├── resources    <-------- here!
    ├── rules
    ├── schemas
    ├── scripts
    └── Snakefile