Closed nsheff closed 3 years ago
@stolarczyk do you have any ideas for how to integrate this into the new plugin system? Would this actually be a peppy issue?
For CWL, we will have to continue to write the sample attributs to the top level -- but maybe we just skip the schema attributes?
the thing is, this change involves the to_yaml
command.
I guess do we need to have the schema attributes appended at the top level? why are the schema attributes appended at all?
here's an idea. what if the default to_yaml just output all the looper variable namespaces? so the sample yaml would be something like:
sample:
sample_name: blahblah
...
project:
...
pipeline:
...
looper:
...
compute:
...
If the schema is important here, then it would be a separate namespace and maybe that would mean the schema should be a separate namespace available for the command templates as well.
In this approach, we'd ajdust the Sample.to_yaml
method, maybe change its name/location, it would provide a yaml-ish thing that has a sample
and project
subcomponents; looper would add to these the looper components and have a to_yaml function.
Ok, here's what we determined to do:
the sample yaml should not include input schema stuff. it should really be a sample yaml: https://github.com/pepkit/peppy/issues/356
peppy needs the ability to write yaml either with or without the project embedded: https://github.com/pepkit/peppy/issues/355
two looper functions wrap each of the peppy sample to_yaml function versions (one with prj embedded and one without),these functions can serve as plugin functions. https://github.com/pepkit/looper/issues/299
looper will no longer direclty call the 'to yaml' function on peppy; it will call via these plugin functions https://github.com/pepkit/looper/issues/299
looper plugin functions will have to define how they want their output file location specified, just as the submission object one does. https://github.com/pepkit/looper/issues/299
we should add a page documenting all these builtin looper plugin functions and how to parameterize them https://github.com/pepkit/looper/issues/299
a new plugin for the whole shebang. https://github.com/pepkit/looper/issues/298
Originally noticed by @nsheff in https://github.com/pepkit/looper/issues/283#issuecomment-680057864
This is the sample yaml produced by looper for the pep in https://github.com/pepkit/pep-cwl:
Notice the schema and sample attributes are attached in parallel. This is problem because they could overwrite each other. For example, if the sample had an attribute called
files
orrequired_files
oryaml_file
orprj
orall_inputs
, what would happen?I would suggest this yaml writer should instead use a
sample
orsample_attributes
subsection for the direct sample attributes. This would require changing any downstream pipelines that relied on the current format (which is I think mostly @afrendeiro's pipelines?). Unfortunately this current approach is not really a good model.