Closed nleroy917 closed 2 years ago
eido
's conversion plugins should not directly print out the result, it should return the value and return the value as well. This is inside conversion_plugins.py
str
. The plugins are either printing to a file or printing. It should be the case that you can return the value. The plugin itself shouldn't print out the result, the cli
itself should print this out.path
you could have paths
(dict
), so you can name your paths. The filter would return a dict
with name and content.folder
parameter and an API could return a zipped filefor simple filters it's nice to just get the converted pep
right away. a zipped file doesn't work as well
If filters simply returned dict
s, then you can convert to json
within an API and return that.
make it so that filters return outputs instead of printing them keep in mind outputs may contain multiple things
pep
-> filter -> converted pep
filter function is independent of filepaths
@nsheff if you want to give a look at the commits I made... I tried to capture whatever bytes-like object the filter plugin was generating and dumping to a file, and either coerce to a string or just return the generated string to the calling function, run_filter
. Inside the calling function, I check for the file path and dump to a file if it requires, otherwise I write out the string to std.out
and then also return that string (or bytes object).
I have yet to test the functions, but I thought I'd stop here before attempting multiple paths.
test your functions. I think you're on the right track.
this will only work with 1-path plugins, though.
I've tested all my functions with the available pep in tests/data/peps
and for one path it seems to function well. How do we envision passing multiple paths to the modules when used inside another python program? (pephub
for example).
this is what we were talking about on monday, I thought an object like this might work:
paths = {
"cfg": "/path/to/config.yaml",
"samples" : "/path/to/samples.csv"
}
Or, we can just not accommodate multiple paths.
Right. I guess I am just thinking of how that'd work from either the command line or inside another package... it might help if I had an example use case. For what filter(s) would one need to supply multiple paths?
Presently, all filters only generated one output -- the bytes-like output that the plugin creates.
For what filter(s) would one need to supply multiple paths?
for example, one that outputs a "processed PEP" -- that is, a project config file, and a csv file, after amendments and modifiers have been applied.
str
return a dict
with specific keys that correspond to files or stdout
element.dict
of named stringsx
do _, and item of name y
do ____stdout
will be what gets returned for specifying where things go.{
"cfg": "/path/to/somewhere,
"stdout": "/path/somewhere/else"
}
if paths
has one path, dump to path
if paths == None
, print to standard out
if paths == dict()
then link keys and dump to those files
Filter assigns filenames -- returns an object with full filenames and values.
path
parameter and then the filter will populate the required paths.stdout
or write to files.@nsheff Do these new filter functions and the controller look better? I forget if we discussed printing the name of the files to standard out if no path is specified; currently I am just printing out the contents of the file(s).
I added a verbose
flag to the run_filter
function that defaults to True
, to support pipelines (like pephub) and such so we aren't polluting std out.
also: I'm not sure why the linter test is failing... I ran black .
on the entire directory
Seems like this is ready -- we need to wait for peppy v0.32.0
and then change the reqs
.
peppy
OK -- peppy is now released, so can we revisit this?
Does this need to be part of 0.1.6 release, which is imminent?
Yes, it permits the programmatic conversions for pephub
also: I'm not sure why the linter test is failing... I ran
black .
on the entire directory
Sometimes this is due to a version mismatch of black. try updating black?
also: I'm not sure why the linter test is failing... I ran
black .
on the entire directorySometimes this is due to a version mismatch of black. try updating black?
That was it thanks.
On my end, this is good to merge for 0.1.6
!
Sorry, just as I was looking at this in more depth I realize it's not doing what I was expecting. I tried to explain what I mean here, does it make sense?
Yes. We are saying that there should be separate file paths for the converted config file and the converted sample table. The only confusion I have is that all filters presently don't create separate output strings for the sample table and config. For example, csv_pep_filter
would only use the samples
key from the input dict when we require a cfg
as well... if I am following correctly.
Is this a forward compatibility thing?
Yes. We are saying that there should be separate file paths for the converted config file and the converted sample table. The only confusion I have is that all filters presently don't create separate output strings for the sample table and config. For example,
csv_pep_filter
would only use thesamples
key from the input dict when we require acfg
as well... if I am following correctly.Is this a forward compatibility thing?
I think you're missing something here. The filter itself will determine what paths it outputs (and therefore, what input paths need to be provided). We want to allow for filters that can handle more than 1 file path, even if none of the current filters require more than 1. Scroll up for an example:
For what filter(s) would one need to supply multiple paths?
for example, one that outputs a "processed PEP" -- that is, a project config file, and a csv file, after amendments and modifiers have been applied.
I'll create a new issue for that, because we should just go ahead and create that filter. Maybe it will help you see the vision here
@nsheff can you look at the most recent commits? I also added some tests with the assumed programmatic use case.
The paths
keyword argument should match the keys of the dict
returned from the specified plugin. The run_filter
function should not expect any sort of return signature from a conversion plugin.
~If there is a key in the return object from the conversion plugin that doesn't exist in the paths
dictionary, then we can just dump to standard out. Or we only return it.~
If verbose=True
, then print out the result. Otherwise, if a path is missing from paths
, then we can just skip it? If the conversion function was returning more than one item and they aren't mentioned in the paths
argument, then we can warn the user.
Also, we will need to update the cli
parser to accept key-value pairs:
--paths cfg=/path/to/cfg samples=/another/to/samples
For this implementation see:
Ok. @nsheff your review would be appreciated. Three things:
run_filter
function in conversion.py
cli
parserWe also need to address the original issue which motivated this (#39)
Ok. @nsheff your review would be appreciated. Three things:
1. The new `run_filter` function in `conversion.py` 2. The new `cli` parser 3. The unit test I wrote
We also need to address the original issue which motivated this (#39)
I don't see the actual processed PEP filter, though (#39)
I don't see the actual processed PEP filter, though (https://github.com/pepkit/eido/issues/39)
I can add that too. Should it be named processed_pep_filter
? And return something like:
return {
'project': ...,
'samples': ...,
'subsamples': ...
}
peppy
applies the sample modifiers on instantiation right?
Then paths
variable needs to look like:
paths = {
'project': '/path/to/project.yaml',
'samples': '/path/to/samples.csv'
}
I don't see the actual processed PEP filter, though (#39)
I can add that too. Should it be named
processed_pep_filter
? And return something like:return { 'project': ..., 'samples': ..., 'subsamples': ... }
peppy
applies the sample modifiers on instantiation right?
Yep! So it's super simple, I guess... read it in, and then return the items... just make sure you get the tabular versions of them, rather than the object versions. Actually, that could be a parameter for this filter: do you want the samples as a table, or as document?
There's two methods on a peppy.Project
instance:
prj.samples
, which produces a list:
[Sample
sample_name: GSM1558746
protocol: GRO
genome: hg38, Sample
sample_name: GSM1480327
protocol: PRO
genome: hg38]
and prj.sample_table
, which looks to create a tab-separated output:
sample_name
GSM1558746 GSM1558746 GRO hg38
GSM1480327 GSM1480327 PRO hg38
Looks like the second is more appropriate
I've got this:
def processed_pep_filter(p, **kwargs) -> Dict[str, str]:
"""
Processed PEP filter, that returns the converted sample and subsample tables.
This filter can return the tables as a table or a document.
:param peppy.Project p: a Project to run filter on
:param bool samples_as_table: Flag to write as a table
"""
# get params
samples_as_table = kwargs.get("samples_as_table")
prj_repr = p.config.to_dict()
if samples_as_table:
sample_repr = p.sample_table.to_csv()
subsample_repr = p.subsample_table.to_csv()
else:
sample_repr = p.sample_table
subsample_repr = p.subsample_table
return {
"project": str(prj_repr),
"samples": str(sample_repr),
"subsamples": str(subsample_repr)
}
But the difference is minuscule... one is a csv
output and one is a tsv
output. Those seem identical to me.
use csv.
I'd just do it like this::
return {
"project": str(prj_repr),
"samples": str(p.sample_table.to_csv()) if samples_as_table else str(p.samples),
"subsamples": str(p.subsample_table.to_csv()) if subsamples_as_table else str(p.subsamples)
}
or, actually I'd probably make table the default; so flip those and use samples_as_objects
, etc.
ok looks good to me. maybe just test it for your purposes with pephub and then I think it's ready to merge.
@nsheff I've tested this with pephub and it functions well for what we discussed, I believe.
I just changed the code to return the result in addition to logging it out. I think still inside the
conversion_plugins.py
file, one will need to return the data as well for this to work (otherwise it will returnNone
). I am not sure if this "breaks" the idea of thecli
. So, we can perhaps find another solution for pephub if necessary -- like capturingstd.out
instead of directly getting a returned value.