maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
387 stars 88 forks source link

Submitting pbs-torque scheduler #101

Closed gadepallivs closed 6 years ago

gadepallivs commented 6 years ago

Hi, We submit jobs via pbs scripts for our NGS pipeline jobs. I came across these snakemake pipelines and interested to try the same on our cluster, but submit jobs via pbs rather than slurm. How should I do this, do you have any support in this regard. So far, I tried to setup up this Snakemake-Profiles. But, ran into errors while submitting the same ( opened an issue.

Just wondering if you have any documentation/example to run your snakemake pipelines via pbs-torque scheduler. Appreciate any help on this. Thank you

dpryan79 commented 6 years ago

At the moment out SlurmEasy script is hard-coded in the various wrapper scripts (e.g., RNA-seq). What you'll need to do is remove that and add qsub or whatever command is appropriate for your cluster. You'll also need to change the options for it (and modify the tool paths).

vivekbhr commented 6 years ago

we only have experience with SGE and slurm (although slurm is hardcoded in the snakepipes). curious to see if it could be adapted to pbs-torque :thinking:

dpryan79 commented 6 years ago

At the end of the day the scheduler should be an essentially replacable component, presuming it works with Snakemake.

gadepallivs commented 6 years ago

Thank you for pointing in the right direction. My experience with python is minimal. I modified the part where slurmEasy is called. Threads probably should be OK, even though I general specify nodes and processs per node. Which paths you suggested to modify ?

snakemake_cmd += ["--cluster-config ",
                   os.path.join(this_script_dir, "cluster.yaml"),
                   " --cluster 'qsub --mem {cluster.memory}
                 --nodes {nodes}
             --walltime {walltime}
             --ppn {ppn}
             --A {A}
             --log", args.cluster_logs_dir,
            "--name {rule}.snakemake'"]

Modified this part snakemake_module_load = "module load snakemake/3.12.0 slurm &&".split() to snakemake_module_load = "source activivate snakemake

We use conda installer for python module hence, the above is what we do instead of module load. Not sure what this part does slurm &&".split() should I change this to qsub &&.split() ?

May be not related Q here. RNA-seq snakeme basically does alignment to diff gene expression automatically. We generally run step by step to verify how the previous run took place and certain parameters are verified before starting the next run. Do you automate any "checks" when in the pipelines. For instance, if alignment score is higher proceed to next step, if not break ?

vivekbhr commented 6 years ago

As long as the final concatenated string is fine you can remove the split statement.

We are not checking the data quality to conditionally run the rule (pipeline terminates only on errors), but we have many QC outputs which the user can manually check and re-run the workflow if necessary.

gadepallivs commented 6 years ago

Thank you vivek. Ryan mentioned about modifying paths , I did not understand where this needs to be changed. Secondly, How are the submitted jobs configured? I assume the pipeline submits a chain of jobs of various size, and in parallel. For instance, alignment may require higher computing power, but another step in the pipelines would not really require higher power. I don't know if this variability could be controlled in snakemake workflows? For instance, use certain wall times and nodes for alignment. Different wall-time and nodes/cpu for the feature counts etc.

dpryan79 commented 6 years ago

The paths are in shared/paths.yaml. If you need to module load stuff on your cluster then you can use module load whatever && there.

vivekbhr commented 6 years ago

@gadepallivs the number of jobs can be specified during snakemake execution using --jobs param. The number of cores per job can be specified using the cluster config (although we have if currently hard-coded in rules but we plan to move this to the cluster config in near future)

mictadlo commented 6 years ago

I would like to run the HI-C pipeline and just wonder whether there is any documentation for PBS pro?

dpryan79 commented 6 years ago

@mictadlo I assume your command would then be something like qsub -u "{rule}" -lncpus {threads} -lmem {cluster.memory} rather than our SlurmEasy command. You can change this in cluster.yaml file mentioned in the output of snakePipes info. I should note that the cluster.memory parameter will be something like "3G" or "3200M", since that's what slurm understands. If PBS pro needs per-core memory requirements in a different format then I suggest writing a wrapper script (ours does a bunch of additional things, but it's Slurm rather than PBS pro).