pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 13 forks source link

Separating looper from models #74

Closed nsheff closed 6 years ago

nsheff commented 7 years ago

We've discussed this briefly in the past and I wanted to formalize and collect discussion here.

looper.models is useful outside of looper, and other systems could use models while providing a different functional implementation of what looper does; or use them for analysis, or whatever. For example, snakemake would use models but not looper.

Right now you can already do this like so (see docs):

from looper import models

my_project = models.Project("path/to/project_config.yaml")
my_samples = my_project.samples

But it would be more natural to separate models into a new project, say promod for Project Models, and then both looper and other tools would do import promod instead.

vreuter commented 7 years ago

:+1: how about projels :thinking:

afrendeiro commented 7 years ago

My initial view of looper was as much of an API as a CLI program, so I'd still prefer if they were together.

Partitioning too much the code makes it somewhat difficult to keep a global view on everything and increases complexity for a user that also wants to understand the system as a whole. This would mean that there are 4 programs needed to keep the usual workflow as is. One can create more submodules if needed no? Another argument is for example if one of the projects gains particular traction there's no guarantee the others initailly design to work together primarily will maintain compatibility (more or less what is happening with looper and pipelines right now).

However if you guys really want to split it, would it be possible that looper imports models in a way that exposes its API in the same way?

PS: Besides, promod and projels are not really convincing names :stuck_out_tongue:

nsheff commented 7 years ago

The API is fully maintained in models; the looper half of things has no API, it uses that API. Looper is actually very simple and much of the important functionality is in models -- I think you just have to think of the original project as "models" now. Looper has no objects; it's just a main that uses the models API to loop through and submit jobs. So the API would not change, it would actual gain prominence as the centerpiece of models, which is where it belongs.

PS pipelines just needs a few very small updates to the interface files to maintain compatibility -- I don't find this particularly problematic... it's mostly because we were very limited in early concepts of looper but as I've started using it for more wide stuff I've been able to generalize lots of things. it maintains backward compatibility, but does require just a few more specifications to account for the increased generality. but that's also what will make it useful to others.

nsheff commented 7 years ago

Also @afrendeiro, one of the big motivating factors behind this is that in discussion with @johanneskoester we realized that snakemake could also make use of the models API side of things, but does not require the looper submission (because this is what snakemake does). So, likely there are other systems in the same vein. We could encourage development, improvement, and sharing of the models concept if it was separated from looper; looper then becomes one of many clients of models. this is good for everyone.

nsheff commented 7 years ago

and actually, one last point: your way of using the API for analysis is also only reliant on models, not on looper; so you don't need looper functionality for any of that stuff, you just need models. that's another example of the above. by bundling with looper, we obscure the much more general utility of models.

nsheff commented 7 years ago

How about probjects ?

vreuter commented 7 years ago

Ha I'm good with that.

nsheff commented 7 years ago

We need a catchy name for the objects project -- @johanneskoester do you have a suggestion for a name?

johanneskoester commented 7 years ago

Not currently. My feeling is that project is a bit too generic though...

nsheff commented 7 years ago

see also epigen/pypiper#39 -- with dividing out models we should use the attribute dict in pypiper as well.

nsheff commented 7 years ago

ok, plan is: after the next release (0.7), we do this separation.