pepkit / eido

Validator for PEP objects
http://eido.databio.org
BSD 2-Clause "Simplified" License
4 stars 6 forks source link

Output processed PEP #25

Closed nsheff closed 2 years ago

nsheff commented 3 years ago

Today talking with some nf-core Nextflow developers, it came up that it would be useful to be able to output a processed PEP, either in CSV format or in yaml/json format.

So, think of it as a PEP (yaml+csv) -> YAML converter... it's kind of a "filter" that would read the PEP and output it in the other format. This is basically what looper does when it creates the sample yaml files, which can be modulated with looper plugins. The difference here I guess is that we don't need all the rest of the looper capability -- just the printing of sample yaml files, perhaps all in one file. We need just some command-line tool that would output the PEP in YAML format.

I think this might make sense to have as part of eido, since it already provides a command-line interface... And in fact, could go to the point of, maybe, extracting out the looper sample-writing capabilities to put into eido. In that case, the plugin system may actually be useful here.

@stolarczyk thoughts?

nsheff commented 3 years ago

Actually we're already close with eido inspect -n : http://eido.databio.org/en/latest/cli/

eido inspect pep_bio.yaml -n sample1
Sample 'sample1' in Project (/home/nsheff/code/incubator/learn_cwl/cwl-pep/bioinformatics_demo/pep_bio.yaml)

sample_name:         sample1
protocol:            RNA-seq
organism:            human
read1:               data/sample1_1.fq.gz
read2:               data/sample1_2.fq.gz
Index:               refgenie://t7/bwa_index
pipeline_interfaces: bwa_cwl_interface.yaml
InputFile1:          data/sample1_1.fq.gz
InputFile2:          data/sample1_2.fq.gz
genome:              t7
stolarczyk commented 3 years ago

ok, I think we should add this capability to the PEP framework then.

I'm just not sure if eido is the right place -- it would enable people to get the processed PEP in Python or on the command line. What about R? Maybe it would make sense to implement this in peppy and pepr and use peppy's Python API in eido to provide this via CLI?

nsheff commented 3 years ago

Maybe it would make sense to implement this in peppy and pepr

Yes that's the alternative option. I think this is a standalone enough function...for now I'd rather only implement it once. It's like validation, we don't implement in R -- you'd use eido to validate. And in this case, the point is to filter and then use for something downstream, regardless of language, so there's no need to implement in 2 languages. you'd use the output as input via streams or files.

So, that argues for putting it into eido -- or in something else that's python-only outside of peppy. Or at least, just not in pepr.

Maybe this is a new pepfilters package.

nsheff commented 3 years ago

Some decisions:

Name: pepconvert

nsheff commented 3 years ago

This is now implemented as eido convert, with all the functionality envisioned in pepconvert. It is awesome. But there are 2 limitations:

  1. Right now filter functions must accept only a Project object, and therefore cannot be parameterized.
  2. Outputs are assumed to be to stdout. But what if I want to output a yaml + csv file? Or one yaml per sample? There's no way to accommodate multiple outputs. I suppose the function could just write them, but it wouldn't be parameterized due to point 1.
stolarczyk commented 3 years ago

we could change the required plugin function signature from plugin(peppy.Project) to plugin(peppy.Project, **kwargs) and add an optional -a/--args argument to the eido convert command. It could accept a string of this format: --args arg1=value1 arg2=value2. We could parse that, stick in a dict and unpack in the plugin() call.

nsheff commented 3 years ago

I'm happy with that approach.

stolarczyk commented 3 years ago

ok, the kwargs support is implemented. Now we just need to update the filter functions to make use of this feature.

nsheff commented 2 years ago

This feature is now relatively complete and functional in version 1.6.0 of eido, with release pending, so I'm closing this issue.