Running Worker Machines Without External IP Address

obsh commented 5 years ago

Hi,

I wonder if there is an option to create worker machines without external IP addresses? I'm Trying to run large number of pipelines in GCP and stuck with IP address quota.

Regards.

samanvp commented 5 years ago

Unfortunately each worker needs external IP to communicate back with the main runner. What I suggest is:

Use larger workers to reduce the number of workers and thus number of needed external IP. For example for make_examples stage you can use 2 workers each with 16 cores and 4*16=48GB of memory. Using larger workers cost you more than using smaller workers which is more cost optimized way to run DeepVariant.
Serialize your runs instead of paralyzing them. This way you can keep the cost of each run as low as possible (by using smaller worker) but you overall running time will be longer.

Unfortunately there is not a perfect solution; you need to compromise either cost or time.

Please let me know if you need help with setting the input argument to optimize the cost based on the size of the BAM file and the type of analysis.

obsh commented 5 years ago

Thank you for recommendations! I’ll try to run with larger worker machines.

Sure, will appreciate if you could give any suggestions on the run configuration. I’m working on a cannabis variants project with a Googler @allenday and I think the goal is to optimize for smaller overall running time. We have 16,000 BAM files with sizes in the range from 60MB to 17GB and reference fa files from 300MB - 1.2GB. We need to produce vcf files. From experience of running a couple of pipelines we selected make example worker machines with a 60GB RAM and 10 CPU as VMs were failing with "out of memory" error when using with less RAM.

All arguments to the runner:

cmd: |
  ./opt/deepvariant_runner/bin/gcp_deepvariant_runner \
    --project "${PROJECT_ID}" \
    --zones "${ZONES}" \
    --docker_image "${DOCKER_IMAGE}" \
    --docker_image_gpu "${DOCKER_IMAGE_GPU}" \
    --gpu \
    --outfile "${OUTPUT_BUCKET}"/"${OUTPUT_FILE_NAME}" \
    --staging "${OUTPUT_BUCKET}"/"${STAGING_FOLDER_NAME}" \
    --model "${MODEL}" \
    --ref "${INPUT_REF}" \
    --bam "${INPUT_BAM}" \
    --shards 512 \
    --make_examples_workers 16 \
    --make_examples_cores_per_worker 10 \
    --make_examples_ram_per_worker_gb 60 \
    --make_examples_disk_per_worker_gb 200 \
    --call_variants_workers 16 \
    --call_variants_cores_per_worker 8 \
    --call_variants_ram_per_worker_gb 30 \
    --call_variants_disk_per_worker_gb 50

obsh commented 5 years ago

With following model and images:

MODEL=gs://deepvariant/models/DeepVariant/0.6.0/DeepVariant-inception_v3-0.6.0+cl-191676894.data-wgs_standard
IMAGE_VERSION=0.6.1
DOCKER_IMAGE=gcr.io/deepvariant-docker/deepvariant:"${IMAGE_VERSION}"
DOCKER_IMAGE_GPU=gcr.io/deepvariant-docker/deepvariant_gpu:"${IMAGE_VERSION}"

samanvp commented 5 years ago

Here are a couple of small changes that will definitely makes your run more efficient:

You better set number of shards to be equal to make_examples_workers times make_examples_cores_per_worker, basically one shard per core.
Since your BAM files very in size, I'd put them into 2-3 buckets; say less than a 1GB, between 1-10GB, and larger than 10GB. I'd set --make_examples_workers 1 for all 3 groups (to save on external IPs) and --make_examples_cores_per_worker 4, 8, and 16 respectively for three buckets.
In my all previous tests it was enough to set 4GB ram per core, both for make_examples and call_variants step. However, it seems for your case this was not enough and you ended up 6GB per core.
For call_variants step you are wasting way too much resources. What we recommend in our automatic flag values (pending PR #11) for BAM files up to 200GB is 2 workers equipped with GPU. Here I recommend 1 worker with GPU for all BAM sizes.
When you are using GPU for call_variants you don't need many cores because GPU will be doing all the heaving lifting. What we recommend is to use 4 cores and 4*4=16GB ram workers equipped with GPU for this stage.

I just want to mention that all my experience of optimizing these flags is for human sample BAM files. I am not really sure what is the density of variants in cannabis. So you might want to apply some fine tuning on top of what I suggested.

Please let me know if there is anything else I can help with.

obsh commented 5 years ago

Thank you very much for the recommendations and explanation of logic behind it! I'll try to run a new setup this week.

googlegenomics / gcp-deepvariant-runner

Running Worker Machines Without External IP Address #26