tritemio / pybroom

pybroom, the python's broom to tidy up messy fit results!
http://pybroom.readthedocs.io/
MIT License
14 stars 5 forks source link

Support for models from statsmodels #5

Open has2k1 opened 7 years ago

has2k1 commented 7 years ago

This is a great initiative. I think before adding more models, it would be better to reorganise the project and make it more modular. So that when tidying methods are added for models from other packages, the core methods glance, tidy and augment are not modified.

tritemio commented 7 years ago

@has2k1, thanks for the interest! How would you implement your proposal?

has2k1 commented 7 years ago

I see the objective as to be able to support Models from all over the the python ecosystem and to do this without introducing any new requirements. It should also be easy to add/remove support for any given model or package.

To meet these requirement, I propose the following structure,

  1. The implementations for any package should reside in a specific folder. e.g:

    • lmfit
    • scipy
    • statsmodels ...
  2. Support each model should preferably be in a separate file. e.g:

    • lmfit
      • linear_model.py
      • gaussian_model.py ...
    • scipy
      • optimize.py ...
    • statsmodels
      • ols.py
      • wls.py ... ...

    Preferably, because there may be cases where 2 or more models have similar output and can be served by the same functions.

  3. In each model file should then implement glance, tidy and augment functions. These functions would be decorated by one of three types of decorators. i.e:

    • @register_glance
    • @register_tidy
    • @register_augment

    The decorators would accept a single parameter, a class and add that class to a dictionary (the registry). The dictionary would be of the form {class: function}.

  4. The main glance, tidy and augment methods would use the respective registries to lookup and invoke the right function for any given input.

  5. Now, when pybroom is imported, the registries need to get filled with out raising any errors incase of missing packages. This is possible but so far I have only vague ideas and not certain which options are better.

Does that make sense, and any additions?

tritemio commented 7 years ago

@has2k1 I like the folder structure. The only thing is that for lmfit and scipy a single file would be enough.

For the decorator to register functions, I would like to see a more concrete example. Did you see this approach used in other packages?

Alternatively the list of available functions can be built at import time. For example if lmfit is installed then import a mapping (declared in the lmfit subpackage) that associates object types with corresponding tidying functions. Then the top-level tidying functions will do only a lookup of the object and call the relevant function.

has2k1 commented 7 years ago

Indeed, I expected that some packages could do with the single file.

About the decorators, the standard library has a tool for it, single dispatch. I think its more than enough for our needs.

Yes, the registries will be built at import time. So when pybroom is imported, all submodules and thier packagess will have to be imported. But there are two other requirements that would be worth aiming for:

  1. pybroom should have no dependencies other than pandas.
  2. When support for a package/model is added, there should be no need to do edits in other files (up the heirarchy).

I do not have a handle on how to best achieve this. I will work it out when I get to writing the code and weigh the trade-off between the fully automatic solution and one that violates the 2nd desired requirement.