Closed jpsmith5 closed 4 years ago
as discussed, this was the design decision that was made to make refgenconf
only take care of the config manipulations. All the file related changes (removal/adding...) are performed by refgenie
.
I like the division, but maybe we should change that?
Why would you want to add an asset from within a pipeline? What's the advantage of doing this over just building the asset (even from within the pipeline), and then having access to it? if you can programatically build it, then it has a recipe and should go via build. To me, add
is useful for manual stuff you can't build.
In this case, the asset is unique to a particular kmer length, but reusable for that genome for future analyses of the same read lengths. So it only needs to be constructed once, but is dependent on that source's read length. So I would construct it the first time that read length is encountered, but then it's not needed for future runs. Because it lives in the genome's folder, it made sense for refgenie
to know about it, particularly for looking for its presence in future runs using the same approach as for other assets. Therefore, the same genome would have multiple components to the parent asset for varying kmers. So it's not a static build recipe at that point. Does that make sense?
Does that make sense?
No...It still seems like it's a build
command for a read length if it doesn't exist, in which case, refgenie builds and manages it.
I would not create a scripted refgenie asset outside of the build system, and then add it. it doesn't make sense to me.
alternatively -- if it's not needed in the future, it should not live in the genome folder managed by refgenie.
the only things put into the refgenie genomes folder should be the result of a build
process (or a pull
). This sounds like it's either a refgenie-managed asset, in which case it should be built
, or it's not a refgenie-managed asset, in which case it should not live in the refgenie-managed folder hierarchy.
@jpsmith5 did you ever resolve this?
I just defaulted to requiring it to be pre-built. If you're running the pipeline through for the first time and you don't know the read lengths, and therefore have not built the requisite index, it will stop the pipeline and warn you it needs that asset. So then the user would be prompted to go build a new index at that length.
but the building itself is actually scripted now as a recipe, right? so it's not an add
thing, right?
Correct. It's just a refgenie build
procedure using a recipe.
ok perfect.
Currently, you can
get_asset()
but there's noadd_asset()
function. Does it make sense forrefgenconf
to have this ability? It would be potentially useful for pipeline building purposes to be able to add a custom asset usingrefgenconf
if some item is generated in the pipeline that is unique to a genome and usable beyond the scope of the single pipeline run. Also may be useful for future custom recipe adding in general?