Closed cademirch closed 7 months ago
Do you see it being installed in earlier steps?
Looks like the wheel gets build: Building wheel for snakemake (pyproject.toml): finished with status 'done'
later, Successfully built snakemake
But building datrie
fails: ERROR: Failed building wheel for datrie.
That's what I suspected! I ran into the same issue and this is the fix, https://github.com/snakemake/snakemake-executor-plugin-googlebatch/pull/29/files#diff-60c1cedcc14ba3f27ae6adfa98fb846b44859d8715209706799ba892aa01ac33R69-R70 but nothing for the container setup was working so it was never merged. Let me do a quick PR that adds just that fix for you to test.
@cademirch if you are able, can you install that branch and test it out?
Hooray! Thanks for such a quick response and fix. Whats the best way to go about testing this? Clone this repo and install it with pip -e .
?
Hooray! Thanks for such a quick response and fix. Whats the best way to go about testing this? Clone this repo and install it with pip -e .?
Development mode (with -e) actually doesn't work for the plugins - do pip uninstall to make sure it's entirely removed (the broken version) and then clone the branch and pip install .
(without the -e).
Looks like that worked, thanks!!
Out of curiosity, what is the rationale for using the setup command over Snakemake's docker image?
These are running on bare VMs - unless you use the container optimized OS there is no docker image.
Ah I see, thanks!
Just ran into this trying to run my actual workflow: FileNotFoundError: [Errno 2] No such file or directory: '/rules/common.smk'
So likely something to do with the workflow sources... I can open a new issue though
The files that appear there need to be handled by the storage client - so likely this is an issue with the google storage provider. I've only used aws to test.
Hmm okay, will look into it! I see the workflow sources tar in my bucket - I guess its not making it to the vm or not in the expected place
Also as a follow up to the docker question, I used the example COS command:
snakemake --jobs 1 --executor googlebatch --googlebatch-image-family batch-centos-7-official --googlebatch-region us-central1 --googlebatch-image-project batch-custom-image --googlebatch-project ccgp-ucsc --default-storage-provider gcs --default-storage-prefix gs://cade_testing --storage-gcs-project ccgp-ucsc
But the printed setup and snakemake commands don't seem to mention using the snakemake docker, is this expected?
🌟️ Setup Command:
export HOME=/root
export PATH=/opt/conda/bin:${PATH}
export LANG=C.UTF-8
export SHELL=/bin/bash
sudo yum update -y
sudo yum install -y wget bzip2 ca-certificates gnupg2 squashfs-tools git
cat <<EOF > ./Snakefile
include: "rules/hello.smk"
# By convention, the first pseudorule should be called "all"
# We're using the expand() function to create multiple targets
rule all:
input:
expand(
"{greeting}/world.txt",
greeting=["hello", "hola"],
),
EOF
cat ./Snakefile
echo "I am batch index ${BATCH_TASK_INDEX}"
export PATH=/opt/conda/bin:${PATH}
repo=https://raw.githubusercontent.com/snakemake/snakemake-executor-plugin-googlebatch
path=main/scripts/install-snek.sh
wget ${repo}/${path}
chmod +x ./install-snek.sh
workdir=$(pwd)
url=https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
wget ${url} -O ./miniconda.sh
chmod +x ./miniconda.sh
bash ./miniconda.sh -b -u -p /opt/conda
rm -rf ./miniconda.sh
conda install datrie --yes
which python
/opt/conda/bin/python --version
./install-snek.sh https://github.com/snakemake/snakemake-storage-plugin-gcs
./install-snek.sh https://github.com/snakemake/snakemake
cd ${workdir}
🐍️ Snakemake Command:
export HOME=/root
export PATH=/opt/conda/bin:${PATH}
export LANG=C.UTF-8
export SHELL=/bin/bash
echo $(pwd)
ls
which snakemake || whereis snakemake
pip install --target '.snakemake/pip-deployments' snakemake-storage-plugin-gcs && python -m snakemake --deploy-sources gs://cade_testing/snakemake-workflow-sources.f75c427fbb9def69ab133f5b9fde75faa382f529edbef210980b8a02212954ea.tar.xz f75c427fbb9def69ab133f5b9fde75faa382f529edbef210980b8a02212954ea --default-storage-prefix gs://cade_testing --default-storage-provider gcs --storage-gcs-retries 5 && python -m snakemake --snakefile Snakefile --target-jobs 'multilingual_hello_world:greeting=hola' --allowed-rules 'multilingual_hello_world' --cores all --attempt 1 --force-use-threads --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --rerun-triggers input software-env code mtime params --conda-frontend mamba --shared-fs-usage none --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --latency-wait 5 --scheduler ilp --local-storage-prefix .snakemake/storage --storage-gcs-retries 5 --default-storage-prefix gs://cade_testing --default-storage-provider gcs --default-resources base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= --mode remote
For family I think it's batch-cos
and then you have to provide a container. https://github.com/snakemake/snakemake-executor-plugin-googlebatch/blob/cacc9fec50903acf69d8d7cfa6339809e3b97b41/snakemake_executor_plugin_googlebatch/executor.py#L108
I wouldn't use batch-cos right now- the currently implementation is running docker on the VM and not using their container backend. This was a PR I opened back in February https://github.com/snakemake/snakemake-executor-plugin-googlebatch/pull/29 to use their container directives, but we've been have trouble with logging so unable to make progress on it. Ping @johanneskoester
Ahhh got it. Sorry for all the questions, and thank you for all the info! I've got a better understanding of how this works now :)
I'm looking into the issue with the rules/common.smk
not being found - will open an issue with a MRE and any findings soon.
Thank you!
Hi @vsoch, sorry to bother you again! I am finally looking to migrate to Batch from GLS, but am having trouble getting Batch to run. Here is a basic workflow I tried to run:
All that is in the env is:
With the reads in a bucket. The command line run was:
snakemake --executor googlebatch --googlebatch-project ccgp-ucsc --googlebatch-region us-central1 --storage-gcs-project ccgp-ucsc --jobs 2 --default-storage-prefix gs://cade_testing_reads --default-storage-provider gcs --use-conda
The Batch logs are quite long, but the one that sticks out is:
So seems like something is wrong with how snakemake is installed in the batch vm/container?
Appreciate any insights. Also let me know if you need more info!