snakemake / snakemake-executor-plugin-googlebatch

Snakemake executor plugin for Google Batch
MIT License
3 stars 5 forks source link

Deployment of workflow sources #5

Closed johanneskoester closed 10 months ago

johanneskoester commented 11 months ago

@vsoch just that you know: Snakemake now automatically deploys the workflow sources before a job executes if the executor implies that there is no shared FS (https://github.com/snakemake/snakemake-interface-executor-plugins/blob/fc37f38f5723c522e7b3e8854d03645e16f53b91/snakemake_interface_executor_plugins/settings.py#L48) or the user sets --no-shared-fs.

I hope this means that you don't need a helper script anymore. Neither you need any specific code for source collection and deployment in your executor. This is already battle tested in the kubernetes executor plugin.

johanneskoester commented 10 months ago

Fixed in the main branch.

vsoch commented 10 months ago

That makes sense. So would an example running command look like this?

$ snakemake --jobs 1 --executor googlebatch --googlebatch-region us-central1 --googlebatch-
project llnl-flux --no-shared-fs
Error: If no shared filesystem is assumed, a default storage provider has to be set.

I know you've mentioned this before - there should be some default with s3/minio?

$ snakemake --jobs 1 --executor googlebatch --googlebatch-region us-central1 --googlebatch-project llnl-flux --no-shared-fs --default-storage-provider s3
WorkflowError:
StorageQueryValidationResult: query hello/world.txt is invalid: must start with s3 (s3://...)
  File "<string>", line 6, in __init__
  File "/home/vanessa/Desktop/Code/snek/snakemake-executor-plugin-googlebatch/example/hello-world/Snakefile", line 3, in <module>

So I tried:

# By convention, the first pseudorule should be called "all"
# We're using the expand() function to create multiple targets
rule all:
    input:
        expand(
            "s3://{greeting}/world.txt",
            greeting = ['hello', 'hola'],
        ),

# First real rule, this is using a wildcard called "greeting"
rule multilingual_hello_world:
    output:
        "s3://{greeting}/world.txt",
    shell:
        """
        mkdir -p "{wildcards.greeting}"
        sleep 5
        echo "{wildcards.greeting}, World!" > {output}
        """
$ snakemake --jobs 1 --executor googlebatch --googlebatch-region us-central1 --googlebatch-project llnl-flux --no-shared-fs --default-storage-provider s3
Building DAG of jobs...
Uploading source archive to storage provider...
WorkflowError:
Failed to store output in storage snakemake-workflow-sources.3d24779cdba7cb1d00d0d15beeaffeb44fea8228e1872249b4707b6541148321.tar.xz
AttributeError: 'StorageObject' object has no attribute 'bucket'
  File "/home/vanessa/anaconda3/lib/python3.11/asyncio/runners.py", line 190, in run
  File "/home/vanessa/anaconda3/lib/python3.11/asyncio/runners.py", line 118, in run
  File "/home/vanessa/anaconda3/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete

I've updated my local version to the branch here (and the Python tests work) but I'm having trouble getting a basic Snakefile derived "hello world" working given I don't have control of storage anymore.