snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see
https://snakemake.github.io
MIT License
2.26k stars 551 forks source link

default remote prefix duplicated for checkpoint output on cloud #574

Closed aryarm closed 2 years ago

aryarm commented 4 years ago

Snakemake version 5.22.1

Describe the bug Using the --default-remote-prefix parameter will cause MissingInputException errors to arise for the outputs of checkpoint rules.

Steps to Reproduce Follow the Google Life Sciences Executor Tutorial but convert the bwa_map rule into a checkpoint like this

diff --git Snakefile Snakefile
index 68974d1..d519d4f 100644
--- Snakefile
+++ Snakefile
@@ -4,7 +4,7 @@ rule all:
     input:
         "plots/quals.svg"

-rule bwa_map:
+checkpoint bwa_map:
     input:
         fastq="samples/{sample}.fastq",
         idx=multiext("genome.fa", ".amb", ".ann", ".bwt", ".pac", ".sa")
@@ -19,7 +19,7 @@ rule bwa_map:

 rule samtools_sort:
     input:
-        "mapped_reads/{sample}.bam"
+        lambda wildcards: checkpoints.bwa_map.get(sample=wildcards.sample).output[0]
     output:
         "sorted_reads/{sample}.bam"
     conda:

and then you should see the following when running the pipeline:

Building DAG of jobs...
MissingInputException in line 20 of Snakefile:
Missing input files for rule samtools_sort:
snakemake-testing-data/snakemake-testing-data/mapped_reads/A.bam

Notice how snakemake-testing-data appears prepended twice?

Bug Hypothesis I think rules.apply_default_remote() is being applied to the output more than once. It might help to check whether incomplete is true on this line before executing rules.apply_default_remote()?

It might be possible that this is a bug that extends beyond the life sciences executor (to other cloud environments), but I haven't tested that yet.

vsoch commented 4 years ago

@aryam7 I've reproduced your error! One thing we could do is check if the prefix is already applied in the workflow function, e.g., in self.workflow.apply_default_remote(value). But the worry would be if someone had a filepath that strangely matched the prefix name (unlikely, but could happen!) What did you have in mind for using incomplete? For example in this case, incomplete is True. I'm not familiar yet with how checkpoints work so I don't have insight as to whether this extends beyond the lifesciences executor.

aryarm commented 4 years ago

Great! Yes, that would be a quick solution! I'm also worried about the case where the filepath matches the prefix name, so that's why I've been exploring other options

My idea for using incomplete is that if rules.apply_default_remote() is being called more than once for checkpoints, then incomplete might be false the first time but true on the second. I haven't tested this yet, but if it's true, then the problem maybe extends beyond the life sciences executor (to any situation in which the default_remote is applied?)

Update: the exception occurs before incomplete is set to false on the second run, so that can't be it.

austinkeller commented 4 years ago

Any updates on this? I'm running into a similar issue in my workflow with checkpoints when I try and use the --kubernetes executor. I may dig into this example tomorrow to try and understand what the underlying issue is...

vsoch commented 4 years ago

@johanneskoester thoughts on this?

aryarm commented 4 years ago

Ok, I may have found somewhat of a fix? I just deleted lines 722 and 723 out of rules.py, and everything just started working again.

I don't really understand the underlying reason, but I'm guessing something about checkpoints causes apply_default_remote() in rules.py to be run more times than usual for a rule. So my approach was to try to figure out where that code is. And lines 722 and 723 seem like good candidates.

I'm planning on spending some more time on it tomorrow. Once I feel like I understand things better, I can try to explain here and maybe write up a PR!

austinkeller commented 4 years ago

@aryarm I tried applying that fix but am still getting an error when I try testing it on minikube.

The following runs the tutorial successfully:

start minikube

mkdir ~/snakemake-checkpoint-issue
cd ~/snakemake-checkpoint-issue

#
# Build latest snakemake
#
git clone https://github.com/snakemake/snakemake
cd ~/snakemake-checkpoint-issue/snakemake

# Checkout v5.25.0
DEV_COMMIT_SHA=a9d155ec

git checkout $DEV_COMMIT_SHA

DEV_IMAGE=snakemake:dev0_${DEV_COMMIT_SHA}

docker build . -t $DEV_IMAGE

#
# Create local runner
#
rm -f local.Dockerfile

cat <<EOF > local.Dockerfile
FROM $DEV_IMAGE
RUN /bin/bash -c "pip install kubernetes"
EOF

LOCAL_IMAGE=snakemake_local:dev0_${DEV_COMMIT_SHA}
docker build -f local.Dockerfile -t $LOCAL_IMAGE .

cd ~/snakemake-checkpoint-issue

#
# Setup tutorial data
#
git clone https://github.com/snakemake/snakemake-tutorial-data
cd ~/snakemake-checkpoint-issue/snakemake-tutorial-data

# Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for our s3 bucket
$(aws-okta env dev-admin-ml-okta)
export AWS_DEFAULT_PROFILE=$AWS_OKTA_PROFILE

# Set S3_REMOTE_PREFIX to our remote prefix
export S3_REMOTE_PREFIX=my.s3.bucket.boop/minikube_snakemake_artifacts/

aws s3 sync data/ s3://$S3_REMOTE_PREFIX

# Un-gitify the directory so that snakemake doesn't try and upload data/
mv .git .git.bak

# Add missing plot-quals.py
cat <<EOF > plot-quals.py
import matplotlib

matplotlib.use("Agg")
import matplotlib.pyplot as plt
from pysam import VariantFile

quals = [record.qual for record in VariantFile(snakemake.input[0])]
plt.hist(quals)

plt.savefig(snakemake.output[0])
EOF

# Run workflow
docker run -it --rm \
    -v $HOME/.kube:/root/.kube \
    -v $HOME/.minikube:$HOME/.minikube \
    -v $PWD:/data \
    --workdir /data \
    -e AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY \
    -e AWS_SECURITY_TOKEN \
    -e AWS_SESSION_TOKEN \
    -e AWS_DEFAULT_REGION \
    $LOCAL_IMAGE \
    snakemake \
        --verbose \
        --debug \
        --use-conda \
        --kubernetes \
        --default-remote-provider S3 \
        --default-remote-prefix $S3_REMOTE_PREFIX \
        --envvars \
            AWS_ACCESS_KEY_ID \
            AWS_SECRET_ACCESS_KEY \
            AWS_SECURITY_TOKEN \
            AWS_SESSION_TOKEN \
            AWS_DEFAULT_REGION

This completes successfully.

Next, I try and reproduce the original issue:

#
# Convert tutorial to checkpoint
#
mv .git.bak .git

rm -f 1.patch

# Apply patch
cat <<EOF > 1.patch
index 68974d1..d519d4f 100644
--- Snakefile
+++ Snakefile
@@ -4,7 +4,7 @@ rule all:
     input:
         "plots/quals.svg"

-rule bwa_map:
+checkpoint bwa_map:
     input:
         fastq="samples/{sample}.fastq",
         idx=multiext("genome.fa", ".amb", ".ann", ".bwt", ".pac", ".sa")
@@ -19,7 +19,7 @@ rule bwa_map:

 rule samtools_sort:
     input:
-        "mapped_reads/{sample}.bam"
+        lambda wildcards: checkpoints.bwa_map.get(sample=wildcards.sample).output[0]
     output:
         "sorted_reads/{sample}.bam"
     conda:
EOF

git apply 1.patch

mv .git .git.bak

#
# Run with tutorial as checkpoint
#

# Clear out previous results and resync data
aws s3 rm s3://$S3_REMOTE_PREFIX --recursive --include "*"
aws s3 sync data/ s3://$S3_REMOTE_PREFIX

# Run workflow
docker run -it --rm \
    -v $HOME/.kube:/root/.kube \
    -v $HOME/.minikube:$HOME/.minikube \
    -v $PWD:/data \
    --workdir /data \
    -e AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY \
    -e AWS_SECURITY_TOKEN \
    -e AWS_SESSION_TOKEN \
    -e AWS_DEFAULT_REGION \
    $LOCAL_IMAGE \
    snakemake \
        --verbose \
        --debug \
        --use-conda \
        --kubernetes \
        --default-remote-provider S3 \
        --default-remote-prefix $S3_REMOTE_PREFIX \
        --envvars \
            AWS_ACCESS_KEY_ID \
            AWS_SECRET_ACCESS_KEY \
            AWS_SECURITY_TOKEN \
            AWS_SESSION_TOKEN \
            AWS_DEFAULT_REGION

This gives me the original error:

Building DAG of jobs...
Full Traceback (most recent call last):
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/__init__.py", line 658, in snakemake
    success = workflow.execute(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/workflow.py", line 684, in execute
    dag.init()
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 173, in init
    job = self.update([job], progress=progress)
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update
    raise exceptions[0]
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update
    self.update_(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_
    raise ex
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_
    selected_job = self.update(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update
    raise exceptions[0]
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update
    self.update_(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_
    raise ex
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_
    selected_job = self.update(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update
    raise exceptions[0]
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update
    self.update_(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_
    raise ex
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_
    selected_job = self.update(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update
    raise exceptions[0]
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update
    self.update_(
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 848, in update_
    raise MissingInputException(job.rule, missing_input)
snakemake.exceptions.MissingInputException: Missing input files for rule samtools_sort:
my.s3.bucket.boop/minikube_snakemake_artifacts/my.s3.bucket.boop/minikube_snakemake_artifacts/mapped_reads/A.bam

MissingInputException in line 20 of /data/Snakefile:
Missing input files for rule samtools_sort:
my.s3.bucket.boop/minikube_snakemake_artifacts/my.s3.bucket.boop/minikube_snakemake_artifacts/mapped_reads/A.bam
unlocking
removed all locks

Next, I try to apply your recommended fix:

#
# Try fix removing snakemake/rules.py:L722-733
#

cd ~/snakemake-checkpoint-issue/snakemake

cat <<EOF > fix1.patch
index 913019e5..3a260386 100644
--- a/snakemake/rules.py
+++ b/snakemake/rules.py
@@ -719,8 +719,6 @@ class Rule:
                     is_unpack=is_unpack,
                     **aux_params
                 )
-                if apply_default_remote:
-                    item = self.apply_default_remote(item)

             if is_unpack and not incomplete:
                 if not allow_unpack:
EOF

git apply fix1.patch

DEV_IMAGE_FIX_1=snakemake:dev1_${DEV_COMMIT_SHA}
LOCAL_IMAGE_FIX_1=snakemake_local:dev1_${DEV_COMMIT_SHA}

# Build local runner
rm -f local.Dockerfile

cat <<EOF > local.Dockerfile
FROM $DEV_IMAGE_FIX_1
RUN /bin/bash -c "pip install kubernetes"
EOF

docker build . -t $DEV_IMAGE_FIX_1
docker build -f local.Dockerfile -t $LOCAL_IMAGE_FIX_1 .

#
# Run with "fix removing snakemake/rules.py:L722-733"
#

cd ~/snakemake-checkpoint-issue/snakemake-tutorial-data

# Run workflow
docker run -it --rm \
    -v $HOME/.kube:/root/.kube \
    -v $HOME/.minikube:$HOME/.minikube \
    -v $PWD:/data \
    --workdir /data \
    -e AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY \
    -e AWS_SECURITY_TOKEN \
    -e AWS_SESSION_TOKEN \
    -e AWS_DEFAULT_REGION \
    $LOCAL_IMAGE_FIX_1 \
    snakemake \
        --verbose \
        --debug \
        --use-conda \
        --kubernetes \
        --default-remote-provider S3 \
        --default-remote-prefix $S3_REMOTE_PREFIX \
        --envvars \
            AWS_ACCESS_KEY_ID \
            AWS_SECRET_ACCESS_KEY \
            AWS_SECURITY_TOKEN \
            AWS_SESSION_TOKEN \
            AWS_DEFAULT_REGION

but I get this error :disappointed:

Building DAG of jobs...                                                                                               
Full Traceback (most recent call last):                                                                               
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/__init__.py", line 658, in snakemake          
    success = workflow.execute(                                                                                       
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/workflow.py", line 684, in execute            
    dag.init()                                                                                                        
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 173, in init                    
    job = self.update([job], progress=progress)                                                                       
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update                  
    raise exceptions[0]                                                                                               
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update                  
    self.update_(                                                                                                     
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_                 
    raise ex                                                                                                          
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_                 
    selected_job = self.update(                                                                                       
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update                  
    raise exceptions[0]                                                                                               
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update                  
    self.update_(                                                                                                     
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_                 
    raise ex                                                                                                          
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_                 
    selected_job = self.update(                                                                                       
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update                  
    raise exceptions[0]                                                                                               
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update                  
    self.update_(                                                                                                     
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_                 
    raise ex                                                                                                          
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_                 
    selected_job = self.update(                                                                                       
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update                  
    raise exceptions[0]                                                                                               
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update                  
    self.update_(                                                                                                     
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 819, in update_                 
    raise ex                                                                                                          
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 803, in update_                 
    selected_job = self.update(                                                                                       
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 753, in update                  
    raise exceptions[0]                                                                                               
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 711, in update                  
    self.update_(                                                                                                     
  File "/opt/conda/envs/snakemake/lib/python3.8/site-packages/snakemake/dag.py", line 848, in update_                 
    raise MissingInputException(job.rule, missing_input)                                                              
snakemake.exceptions.MissingInputException: Missing input files for rule bwa_map:                                     
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.ann                                         
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.sa                                          
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.amb                                         
my.s3.bucket.boop/minikube_snakemake_artifacts/samples/A.fastq
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.pac
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.bwt

MissingInputException in line 7 of /data/Snakefile:
Missing input files for rule bwa_map:
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.ann
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.sa
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.amb
my.s3.bucket.boop/minikube_snakemake_artifacts/samples/A.fastq
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.pac
my.s3.bucket.boop/minikube_snakemake_artifacts/genome.fa.bwt
unlocking
removed all locks
austinkeller commented 4 years ago

Interestingly, if I revert the Snakefile from using a checkpoint back to using a rule (but keep the deletion of lines 722-723), the run succeeds:

mv .git.bak .git
git stash
mv .git .git.bak

docker run -it --rm \
    -v $HOME/.kube:/root/.kube \
    -v $HOME/.minikube:$HOME/.minikube \
    -v $PWD:/data \
    --workdir /data \
    -e AWS_ACCESS_KEY_ID \
    -e AWS_SECRET_ACCESS_KEY \
    -e AWS_SECURITY_TOKEN \
    -e AWS_SESSION_TOKEN \
    -e AWS_DEFAULT_REGION \
    $LOCAL_IMAGE_FIX_1 \
    snakemake \
        --verbose \
        --debug \
        --use-conda \
        --kubernetes \
        --default-remote-provider S3 \
        --default-remote-prefix $S3_REMOTE_PREFIX \
        --envvars \
            AWS_ACCESS_KEY_ID \
            AWS_SECRET_ACCESS_KEY \
            AWS_SECURITY_TOKEN \
            AWS_SESSION_TOKEN \
            AWS_DEFAULT_REGION
vsoch commented 4 years ago

One detail to keep in mind if you are testing with a container orchestrator (e.g. Kubernetes or GLS) is that the version of Snakemake you run from (In your host) is not necessarily == the version the remote builder uses (or the version in you cluster), you have to set the container image flag to determine this - https://github.com/snakemake/snakemake/blob/be2d72caac18cd025480521f16a3a7202b482e27/snakemake/executors/__init__.py#L1447. So what would typically happen with a release of Snakemake is the current release Docker image would be used, but in practice in development I’ve had to do a git fetch to retrieve all tags and derive an updated version. If you’ve built a custom local image then likely it’s using the tag from the last version release and not your changed code. To have the worker use your changes you would want to push the container with your changes somewhere and then specify with the container image flag.

aryarm commented 4 years ago

That's a good point, @vsoch. Thank you! At the moment, I've just been trying to fix these exceptions for when the DAG gets evaluated locally (ie before any Snakemake Docker images are used on the cloud). Now I realize that this bug also causes exceptions to arise within jobs that get submitted to the cloud, too. I'll try creating a custom container image, like you suggested. But how would I get the CI test to work in the PR?

@austinkeller, it sounds like the fix isn't working for you because of the issue that @vsoch mentioned. We won't be able to properly test whether the fix works unless we use a custom container image.

aryarm commented 4 years ago

Also, as promised, here's an explanation of what I think is happening.

When Snakemake parses a Snakefile, one of its very first steps is to set the input and output of every rule based on the user-provided input/output directives. While this is happening, it also calls rules.apply_default_remote() on every input and output string in the Snakefile. So after it's done parsing the Snakefile, each rule object will have prefixed input and output strings.

As we know, checkpoints trigger the DAG to get reevaluated, though. During that process, the wildcards for every input are injected into the input and output of each rule via _apply_wildcards(). But unfortunately, one of the steps in _apply_wildcards() is to call rules.apply_default_remote() again.

I'm not sure why rules.apply_default_remote() is called from _apply_wildcards() if every rule will already have the prefix from earlier. So instead of deleting the lines that call rules.apply_default_remote() in _apply_wildcards(), I think it might be best to just check if incomplete is true before calling rules.apply_default_remote(), since incomplete will be true only if the input is the output of a checkpoint.

So here's my proposed solution:

-                if apply_default_remote:
+                # incomplete items will already have the default remote prefix
+                if apply_default_remote and not incomplete:

I created this test to minimally reproduce the issue. The fix that I'm recommending seems to work for that test, at least until it tries to use the older code on the cloud because of the debugging problem @vsoch mentioned.

vsoch commented 4 years ago

@aryarm I think we are working on this in #691, it's just been a bit slow because the GLS testing is currently down. But we did see an issue there that led to the double parsing of the bucket name.

aryarm commented 4 years ago

Ah, ok! Thank you for pinging me here to let me know, @vsoch! If you find yourself in need of a test to reproduce the double parsing, here's a commit that might be helpful.

I haven't been able to work on this issue much recently because of other commitments, but I may have more time later if you need me.

vsoch commented 4 years ago

Thanks @aryarm I really appreciate that! I'm hoping we will get our testing running again soon, and I'll take a look at the commit to see if it can help the current test.

vsoch commented 3 years ago

hey @aryarm ! We just merged a PR that at least can handle the redundant name for standard runs, would you care to test, and also check for the checkpoint case? If you find the issue for checkpoint, please provide a simple example to reproduce with the current master.

aryarm commented 3 years ago

yay! thank you so much, @vsoch! I'll check it out once a new release with the fix is published

Being able to test the GLS code via CI will also be really helpful for me! If this issue persists, I should be able to finally test that idea for a fix that I had before.

vsoch commented 3 years ago

I agree! @johanneskoester was on fire this morning :) :fire:

klai5 commented 3 years ago

Issue still exists using tibanna (AWS) with checkpoints. The default-remote-prefix is appended twice.

aryarm commented 3 years ago

@klai5, so the fix that I tried in aryarm@7e6c2d5 didn't help on tibanna?

klai5 commented 3 years ago

@aryarm, I tested 5.22.1 with your solution and also the 6.0.5 version (latest). I receive the following error with your fix:

snakemake/executors/init.py", line 1827, in handle_remote if isinstance(target, _IOFile) and target.remote_object.provider.is_default: AttributeError: 'NoneType' object has no attribute 'provider'

johanneskoester commented 3 years ago

Hi folks, I'd like to work on a fix for this issue. @aryarm as requested here, could you provide first a PR with your testcase?

aryarm commented 3 years ago

could you provide first a PR with your testcase?

done!

Evidently, the issue does still occur in the latest version of Snakemake (v6.9.0). It reared back up when I ran nosetests locally.

Thanks in advance for resolving this, @johanneskoester 😄

cademirch commented 2 years ago

Any update on this bug? It seems to still be present on 6.10.0.

aryarm commented 2 years ago

@cademirch, we're working on reproducing it in #1199 It's been difficult to verify that the test case actually fails properly because the CI needs to be instructed to run the test case on GCP. At the moment, the challenge seems to be some sort of bug with passing secrets in the CI.

Once that's done, we can finally try the original solution I proposed or come up with another if that doesn't work. Unfortunately, we can't properly test any solution until we can reproduce the issue in the CI because of the way that Snakemake submits jobs to GCP through the GLS: even if you implement a solution on your local computer, Snakemake will use a different version of its own code within the GCP instance. See this comment for details. We would need to upload a custom container and instruct Snakemake to use that, instead.

cademirch commented 2 years ago

@aryarm Okay, I see - thanks for the update. Unfortunately I'm not very familiar with the Snakemake code, but I've taken a quick look and it seems the logic for applying the default remote prefix has moved to here https://github.com/snakemake/snakemake/blob/01d6102795c96ce695d6d7201f7e4655a1d5cac8/snakemake/path_modifier.py#L14 I'm not sure how much I can help with this, but I'll take a deeper look sometime this week as I'm also trying to figure out what is causing #1260.