pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

Moving _get_sample_yaml_path to a sample #288

Closed nsheff closed 3 years ago

nsheff commented 4 years ago

Right now the SubmissionConductor object has a _get_sample_yaml_path function.

https://github.com/pepkit/looper/blob/fd7a95fa31ed71b32752019475cae0e34b595982/looper/conductor.py#L348

I need this in the functions that write sample yaml files. So, right now these hook functions require 2 inputs: the sample, and the submission conductor:

https://github.com/pepkit/looper/blob/fd7a95fa31ed71b32752019475cae0e34b595982/looper/conductor.py#L33

I'd rather simplify this so the hook functions need only the sample object. that will make it easier for someone unfamiliar with looper to write it. I don't want these to have to rely on the submission conductor. Is there a way to move this function to a more universal spot, or does it really depend on the submission conductor?

stolarczyk commented 4 years ago

This dependency stems from the sample.yaml location flexibility we've recently introduced: https://github.com/pepkit/looper/issues/61#issuecomment-609101228

However, it does not depend on the submission conductor, but on the project and pipeline interface the sample is processed for*. Attributes of these objects are required to render the path to the sample.yaml:

https://github.com/pepkit/looper/blob/fd7a95fa31ed71b32752019475cae0e34b595982/looper/conductor.py#L364-L367

*The submission conductor carries the information on the "currently processed" pipeline interface so I think it is vital.

One solution would be to make this method a regular function, independent of submission conductor, but accept it as an optional argument. However, if submission conductor is not provided, you lose the ability to specify custom sample.yaml destination path. It's not ideal, but maybe sufficient?

nsheff commented 4 years ago

What if we generalize the concept of the sample_yaml_path section? Since it's completely tied to the original sample yaml generation, which is now accomplished via a plugin function... perhaps it needs to become more generalization as a way to pass generic arguments from the pipeline interface into plugin functions.

What if we change sample_yaml_path into a new section called sample_attribute_templates or something, which would work like:

pipelines:
pipeline_name: test
pipeline_type: sample
sample_attribute_templates:
  sample_yaml_path: {sample.sample_name}.yaml
  sample_cwl_yaml: {sample.sample_name}_cwl.yaml

These template attributes could be added to the sample object , or instead maybe be accessed via the variable namespaces in the plugin functions...

Or, is this essentially repeating the work of the output schema? In that case, perhaps we should simply eliminate the sample_yaml_path section and expect that you would provide that as an element in the output schema. We'd make the output schema attributes available to the plugins.

Some differences, though:

nsheff commented 4 years ago

A follow-up: is it possible that someone would want access to other things from within this pre_submit function? For instance, maybe I want to know something about the pipeline interface, or the compute settings that have been passed.

Also, maybe I'd want to modify those.

nsheff commented 3 years ago

@stolarczyk just thinking we can close this, as this is solved with the new way of specifying templates, and that all namespaces are accessible from the plugin functions, correct?

stolarczyk commented 3 years ago

Yessir