snakemake / snakemake-storage-plugin-gcs

A Snakemake storage plugin for Google Cloud Storage
MIT License
2 stars 4 forks source link

Snakemake storage plugin: gcs: WorkflowError #25

Closed Fadwa7 closed 2 months ago

Fadwa7 commented 4 months ago

Hi Everyone, I'm working on a GCP VM, and i try to remote my output file in the bucket GCS (google cloud storage). I've mad this :

#!/bin/python3 
configfile: "config.json"
import re 
import csv 
import json
import subprocess
import os

# register shared settings

storage:
        provider="gcs",
        max_requests_per_second= None,
        project= "arctic-carving-413109" ,
        keep_local= False ,
        stay_on_remote= False,
        retries= 5, 

with open('config.json', 'r') as config_file:
        config = json.load(config_file)

fichier_csv = config.get("SRA_LIST")
if "SRA_LIST" in config:
    fichier_csv = config["SRA_LIST"]

SRA_LIST = []
with open(fichier_csv, 'rt') as f:
    for line in f:
        line = line.split()[0].strip()
        if re.match('[SED]RR\d+$', line): 
            SRA_LIST.append(line) 

rule all : 
     input: 
           storage.gcs(expand("Fastq_Files/{sra}.fastq.gz", sra=SRA_LIST)),

include: "rules/fastq.smk"

and when i ran snakemake i get this error

WorkflowError:
Error applying storage provider gcs (see https://snakemake.github.io/snakemake-plugin-catalog/plugins/storage/{provider}.html). {query_validity}

I can't find a lot of explanation about this new properties. And i wish someone can help me

Thank you in Advance

Best Regards

johanneskoester commented 2 months ago

The error message has been fixed a while ago, and should now be more informative. In general, all storage providers (except the fs one) nowadays expect a schema prefix. In case of gcs it is gcs:// for specifying queries. So in your case, it should be storage.gcs(expand("gcs://Fastq_Files/{sra}.fastq.gz", sra=SRA_LIST)).