Closed nsheff closed 2 years ago
In addition the file path output (the paths
parameter), this filter should also accept a dict of environment variables, which will be used in the processing step. This way, users can provide environment variables, which will be used to expand paths.
The value there is that we can use this to construct a proper processed PEP, with correct local paths, on whatever computing environment, even remotely.
This will be useful for implementing https://github.com/pepkit/pephub/issues/3
I think. The input to a conversion filter function then needs to be:
{
"paths": {
"cfg": "/path/to/config,
"samples": "/path/to/samples"
},
"envs": {
"env1": ...
}
}
And if paths
is specified, then the files are written to those paths utilizing those env
vars?
I think. The input to a conversion filter function then needs to be:
{ "paths": { "cfg": "/path/to/config, "samples": "/path/to/samples" }, "envs": { "env1": ... } }
And if
paths
is specified, then the files are written to those paths utilizing thoseenv
vars?
Yep! you got it.
Specifically, the paths may be something like Oops I wasn't thinking about this correctly. The env vars would be used in the populating of the paths in the sample table -- not necessarily in the paths passed in by the user (it shouldn't be necessary there)."cfg": "${env1}/subpath/to/config"
Hence the need for expandpath
for the env vars.
Anyway, what you said is still right -- just keep in mind the env vars aren't for these paths, they're for use by peppy in populating the derived columns, which is even easier.
Anyway, what you said is still right -- just keep in mind the env vars aren't for these paths, they're for use by peppy in populating the derived columns, which is even easier.
Does a filter need then to set these env variables when it runs? E.g.
for env in cfg["envs"]
os.environ[env] = cfg["envs"][env]
Also for the proposed filter. Does something like this make sense:
def processed_pep_filter(cfg: dict):
proj = peppy.Project(cfg=cfg['cfg'])
return {
'proj': str(proj),
'samples': str(proj.samples)
}
Maybe the calling function wrapper for these filters can set env variables?
Does a filter need then to set these env variables when it runs?
Something like this... or else, some other way to just make those available to the expandpath
functions that get used within Peppy. I'd have to dive into peppy and do some trial and error to say exactly how to do it. your solution may work.
Does something like this make sense
Well, you'll need to convert these into files, ultimately. So they'll have to be string representations. Remember, the output of a filter is one or more files. Oops I see now you are actually using string representations. But it might have to be more complicated than the generic representation -- if we want the samples table to come out as a processed csv, for example, I don't think str(proj.samples)
does it.
So you're on the right lines, but it also has to include the subsamples file, I guess (unless you're using the new one-file sample table approach, which may be better).
Presently, the wrapper around the conversion filters passes a peppy.Project
instance and a plugin_kwargs
dict
. Does it make sense to just utilize that to pass the file paths?
plugin_kwargs = {
"paths": {
"cfg": "/path/to/config,
"samples": "/path/to/samples"
},
"envs": {
"env1": ...
}
}
It currently looks for "paths" inside that dictionary and if it finds it, it spits the result out to that path. Maybe we need both a "paths" and "path" key to direct the output to a directory and direct the names of the files... I'd be afraid of moving away from the path
keyword for fear of breaking anything else inside eido since it feels like that is used heavily
I'd be afraid of moving away from the path keyword for fear of breaking anything else inside eido since it feels like that is used heavily
I don't think it is... I wouldn't worry about this... I think this is all new
This is too complicated. Don't use the folder method at all. It should be only paths
.
Presently, the wrapper around the conversion filters passes a peppy.Project instance and a plugin_kwargs dict. Does it make sense to just utilize that to pass the file paths?
I'm not sure I really understand the question... For a filter (any filter), to call the filter, you'd give a paths
parameter, where you'd specify where you want it to spit things out. This is the same for every filter. This is the only way to do it, isn't it?
See comment here, I think this explained it well: https://github.com/pepkit/eido/pull/38/files/2815ecb1b2abac6cbee0cdeea62b939e4c311209#r864395751
yeah I think that makes sense, and I'm trying to switch over to that implementation
We need to have a built-in filter that returns a processed PEP.
So, the input is, of course, a PEP. The output is a PEP that has run through sample modifiers and project modifiers -- that is, a processed PEP.
So, the user who calls the filter, if he wants to spit the output to files, needs to provide several output file paths; in fact, all the possible files that could go into a PEP:
See PR #38.
All the filter does is load the PEP with peppy, and then return the objects (as strings), which will have been processed.
@nleroy917 does this make sense to you?