refgenie / refgenconf

A Python object for standardized reference genome assets.
http://refgenie.databio.org
BSD 2-Clause "Simplified" License
3 stars 6 forks source link

Looper plugin #129

Closed nsheff closed 3 years ago

nsheff commented 3 years ago

Provides a new refgenie namespace, which allows pipeline interfaces to not even have to specify the refgenie assets of interest; just use them directly! Like this:

pipeline_name: demo
pipeline_type: sample
command_template: >
  python pipeline.py 
  --index {refgenie.bowtie2_index}
  --fasta-file {refgenie.fasta}
  --sample-name {sample.sample_name}
  --anno-name {refgenie.bwa_index}
var_templates:
  refgenie_config: "$REFGENIE"
pre_submit:
  python_functions:
  - refgenconf.looper_refgenie_populate

notice the {refgenie.bowtie2_index} in there. That uses the new refgenie namespace, which is supplied by this modification to the looper plugin. So it prevents you from having to do:

var_templates:
  bowtie2_index: "refgenie://{sample.genome}/bowtie2_index.dir"

and then refer to it with {pipeline.var_templates.bowtie2_index}.

The only limitation here is raised in #128 -- we need a way to provide all the different seek keys. As programmed initially, it only provides access to the default seek keys.

See also: https://github.com/refgenie/refgenie_looper_demo (demo right now using the old way, which would be simplified by this change).

codecov[bot] commented 3 years ago

Codecov Report

Merging #129 (1f06bd7) into dev (b9e6f49) will decrease coverage by 1.20%. The diff coverage is 4.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #129      +/-   ##
==========================================
- Coverage   81.00%   79.80%   -1.21%     
==========================================
  Files          36       36              
  Lines        2812     2857      +45     
==========================================
+ Hits         2278     2280       +2     
- Misses        534      577      +43     
Impacted Files Coverage Δ
refgenconf/refgenconf.py 70.88% <4.00%> (-1.30%) :arrow_down:
refgenconf/populator.py 21.21% <5.00%> (-24.95%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b9e6f49...1f06bd7. Read the comment docs.

nsheff commented 3 years ago

Thanks michal, that function worked great! I had to restructure the output from it a bit to make it accessible in the pipeline interface, since the genome is provided by the project rather than by the pipeline... I also made the tags customizable in the PEP as well.

Take a look at this revision.

Both ways to do it are now documented here:

https://github.com/refgenie/refgenie_looper_demo

The new way is now super super simple. Just add the plugin, and then use {refgenie.asset.seek_key} in your command template!

stolarczyk commented 3 years ago

I like it! it's so easy to use

nsheff commented 3 years ago

It doesn't seem to be populating variables in the sample namespace, though. Need to fix that.

What is happening is: if you write a sample yaml, you need to run the populate plugin before the sample yaml writing plugin; otherwise you'll write unpopulated sample yamls.

So, the plugin order matters, of course!

nsheff commented 3 years ago

Ok this is working great now. I've documented and provided a working demo here:

https://github.com/refgenie/refgenie_looper_demo/blob/master/README.md

Where do you think the docs for this should go? In refgenie or in looper? Or separate?

nsheff commented 3 years ago

Docs are currently here: http://refgenie.databio.org/en/latest/populate/

stolarczyk commented 3 years ago

I don't have a strong preference but, I'd probably document the plugin in looper, where all other plugins are documented (http://looper.databio.org/en/latest/pre-submission-hooks/), and then mention the integration in refgenie docs and link to looper plugin docs.