googlegenomics / gcp-deepvariant-runner

This repository contains a docker container that runs DeepVariant on the Google Cloud Platform.
Apache License 2.0
2 stars 6 forks source link

Running multiple samples with the same staging folder name #29

Closed saliksyed closed 5 years ago

saliksyed commented 5 years ago

Hi,

I started runs on about 15 samples with the deep variant sample script (pasted below). I have the same staging folder for all the samples -- will this cause problems, or will thee samples be queued somehow that they do not overwrite each others shared disk space?

#!/bin/bash
set -euo pipefail
# Set common settings.
PATIENT_NAME=${1}
PROJECT_ID=valis-194104
OUTPUT_BUCKET=gs://valis-private/deep_variant_family_output
STAGING_FOLDER_NAME=deep_variant_staging
OUTPUT_FILE_NAME=${PATIENT_NAME}_deep_variant.vcf
OUTPUT_GVCF_FILE_NAME=${PATIENT_NAME}_deep_variant.gvcf
# Model for calling whole genome sequencing data.
MODEL=gs://deepvariant/models/DeepVariant/0.8.0/DeepVariant-inception_v3-0.8.0+data-wgs_standard
IMAGE_VERSION=0.8.0
DOCKER_IMAGE=gcr.io/deepvariant-docker/deepvariant:"${IMAGE_VERSION}"
COMMAND="/opt/deepvariant_runner/bin/gcp_deepvariant_runner \
  --project ${PROJECT_ID} \
  --zones us-west1-b \
  --docker_image ${DOCKER_IMAGE} \
  --gvcf_outfile ${OUTPUT_BUCKET}/${OUTPUT_GVCF_FILE_NAME}\
  --outfile ${OUTPUT_BUCKET}/${OUTPUT_FILE_NAME} \
  --staging ${OUTPUT_BUCKET}/${STAGING_FOLDER_NAME} \
  --model ${MODEL} \
  --bam gs://valis-private/${PATIENT_NAME}.final.bam \
  --ref gs://valis-private/human_g1k_v37_decoy.fasta   \
  --gcsfuse"
# Run the pipeline.
gcloud alpha genomics pipelines run \
    --project "${PROJECT_ID}" \
    --service-account-scopes="https://www.googleapis.com/auth/cloud-platform" \
    --logging "${OUTPUT_BUCKET}/${STAGING_FOLDER_NAME}/runner_logs_$(date +%Y%m%d_%H%M%S).log" \
    --zones us-west1-b \
    --docker-image gcr.io/cloud-lifesciences/gcp-deepvariant-runner \
    --command-line "${COMMAND}"
samanvp commented 5 years ago

All the intermediate files, such as make_examples() and call_variants() outputs will be written to directories inside staging folder (examples/ and called_variants/ respectively). So dedicated staging directories will be needed for each run when you have multiple parallel runs, for example: STAGING_FOLDER_NAME="deep_variant_staging/${PATIENT_NAME}"

Thanks for raising this point, I will update your documentations soon.