Closed nsheff closed 3 years ago
I'm just wondering if populate_refgenie_refs is not just a special case of replace_str_in_obj, which I recently implemented.
well, probably related -- but it's more complicated here, because I have to use rgc.seek
to look up what to replace with, since the str to replace is a registry path which is dynamic... so I doubt it would be straightforward to reuse in this case...
@stolarczyk would you mind taking a quick look at this merge conflict? I don't want to mess anything up. I'm planning to merge this PR now
For example, I'm not sure why __all__
was removed in dev
@stolarczyk can you explain why you changed the branch from dev to master?
Merging #113 (5cdfa13) into dev (7681851) will increase coverage by
0.36%
. The diff coverage is86.02%
.
@@ Coverage Diff @@
## dev #113 +/- ##
==========================================
+ Coverage 77.54% 77.91% +0.36%
==========================================
Files 31 33 +2
Lines 2601 2694 +93
==========================================
+ Hits 2017 2099 +82
- Misses 584 595 +11
Impacted Files | Coverage Δ | |
---|---|---|
refgenconf/populator.py | 66.66% <66.66%> (ø) |
|
refgenconf/refgenconf.py | 66.34% <67.74%> (+0.19%) |
:arrow_up: |
refgenconf/__init__.py | 100.00% <100.00%> (ø) |
|
tests/test_populate.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7681851...5cdfa13. Read the comment docs.
I changed the branch so that we can release it sooner, you said it's ready. I didn't want it to get stuck for a couple of months in dev before we make another huge release with the new recipe system.
Just realized that there will be a sooner release, introducing the "cloud links" so maybe that's all for nothing..
how about making it a RefGenConf
method? RefGenConf.populate
or RefGenConf.populate_ref
I had some reason for not doing it that way at first... I think it had something to do with the way looper plugins work.
but I also wrote a method that calls this function as well, I guess I haven't pushed it yet
actually I think at this point it can just all become a method.
great!
You can also add a test that considers a structured input. you can give .populate
either a raw string, which will search for any refgenie paths, or you can give it a nested dict, and it will search all values of the dict, following nesting.
that's exactly what I'm doing now :)
is this ready to merge?
I'm not planning any updates here, so it's ready if you are done
Wait for:
This PR introduces a new concept called the refgenie populator. It is used to scan input for
refgenie://
registry paths and replace them with theirrefgenie seek
local paths.Use cases:
It can be used to pre-populate input for a workflow run. This way, even a refgenie-unaware workflow can benefit from refgenie because you just run this populator on your input data just before running the workflow on it. This is the way CWL works; CWL workflows in best practices require knowledge of all input files before the workflow run begins. So, it doesn't make sense to pass a registry path, which is then resolved by refgenie inside the workflow.
It's also nice that this makes the workflow itself independent of refgenie.
I've implemented a looper plugin that makes this automatic. Use like:
I would suggest we also implement a
refgenie populate
command-line command that would populate anyrefgenie://
paths and give the result to stdout, which could be easily done using functions included in this PR.See it in action here: https://github.com/pepkit/pep-cwl/tree/32944212e109fae09119c3196c90596e7101b61d/bioinformatics_demo
piface using the plugin: https://github.com/pepkit/pep-cwl/blob/dev/bioinformatics_demo/bt2_cwl_interface.yaml