Closed HamiltonG closed 3 years ago
Hi @HamiltonG
The one-step script whose usage is shown in https://github.com/google/deepvariant#how-to-run-deepvariant will work on a cluster, just note that giving it something like 64 threads will help it run faster. Our case study metrics are from runs with 64 CPU cores and no GPU, so those numbers should give you an idea if that works for your purposes.
For the simple run_deepvariant case, if Docker isn't available to you on your cluster, the same container and commands can be used with Singularity. This is I think what most people do when running on a cluster.
If you really want to optimize a process to run DeepVariant many times, it can be worth running the 3 stages separately and giving them different resources because make_examples wants many CPUs, call_variants runs faster on GPUs, and postprocess_variants really just needs 1 CPU. The external solutions do variations of this plus their own special sauce.
I hope that helps answer your question, Maria
Hi Maria,
Thank you for your suggestions. I am getting closer to running but I have not quite succeeded yet.
Here is where I am at the moment :
Lets say my 'deepvariant_v1.0.0.sif' file is sitting in path_a my reference sequence in path_b my pacbio bam in path_c
Could you advise on the singularity command to execute this job?
module load chpc/singularity
Thank you for your insights.
Kind regards,
It looks like you're not mounting any directories in your Singularity command.
See the FAQ for how to debug that: https://github.com/google/deepvariant/blob/r1.1/docs/FAQ.md#why-cant-it-find-one-of-the-input-files-eg-could-not-open -- "Why can't it find one of the input files? E.g., "Could not open""
If that doesn't work, can you include the error messages too?
I'll close this since we're continuing the conversation over email.
Hi there,
I've been trying to figure out how to actually run deepvariant in a cluster environment but thus far, the instructions seems a little cryptic to me. Is there perhaps a step-by-step guide to running deepvariant on a cluster with a PBS scheduler for instance?