lnccbrown / HSSM

Development of HSSM package
Other
73 stars 10 forks source link

Package structure #113

Closed digicosmos86 closed 1 year ago

digicosmos86 commented 1 year ago

As our understanding of HSSM improves, I have realized that the structure of our package is slightly sub-optimal and could use a bit of restructuring to make it better organized. The wfpt folder, in particular, is a misnomer because we work with ssms in general now. If this should happen, it should happen early because it will affect our documentation. Here's what I propose:

src/hssm
├── __init__.py
├── hssm.py
├── utils.py
└── wfpt
    ├── __init__.py
    ├── base.py  ## rename to likelihood.py
    ├── config.py  ## rename to defaults.py and move to a new folder called config (we might want some global configs like pandas so that could go into this folder), or move out of this folder
    ├── lan ## take out of the wfpt folder (could potentially be renamed to lan_utils?)
    │   ├── lan.py
    │   ├── onnx2pt.py
    │   ├── onnx2xla.py
    └── wfpt.py ## rename to dist.py and move to a new folder called dist_utils because this file contains utility functions for creating distributions

New folder structure:

src/hssm
├── __init__.py
├── hssm.py
├── config.py
├── utils.py
└── wfpt
    ├── __init__.py
    ├── likelihoods.py ## Renamed from base.py
    ├── distributions.py ## Hosts the WFPT and WFPT_SDV classes (or not lol)
└── dist_utils
    └── dist.py
└── lan ## take out of the wfpt folder (could potentially be renamed to lan_utils?)
   ├── lan.py
   ├── onnx2pt.py
   ├── onnx2xla.py
AisOmar commented 1 year ago

also rename folder wfpt to likelihoods

digicosmos86 commented 1 year ago

I like the idea. Currently only one file base.py contains likelihoods though, so my idea is to keep the wfpt folder, rename base.py to likelihoods.py for the analytical likelihoods, and have a new distributions.py that has the WFPT and WFPT_SDV distribution. What do you think?

AlexanderFengler commented 1 year ago

how about using something like this?

src/hssm
├── __init__.py
├── hssm.py
├── config.py
├── utils.py
└── likelihoods
   └── analytical
        ├── __init__.py
        ├── likelihoods.py
        ├── distributions.py 
   └── approx_differentiable
       └── lan 
          ├── __init__.py
          ├── lan.py
          ├── onnx2pt.py
          ├── onnx2xla.py
└── dist_utils
    ├── __init__.py
    └── dist.py

Or is nesting frowned upon? This would sort of be predictable from how you use the package, so potentially more easy to contribute?

digicosmos86 commented 1 year ago

how about using something like this?

src/hssm
├── __init__.py
├── hssm.py
├── config.py
├── utils.py
└── likelihoods
   └── analytical
        ├── __init__.py
        ├── likelihoods.py
        ├── distributions.py 
   └── approx_differentiable
       └── lan 
          ├── __init__.py
          ├── lan.py
          ├── onnx2pt.py
          ├── onnx2xla.py
└── dist_utils
    ├── __init__.py
    └── dist.py

Or is nesting frowned upon? This would sort of be predictable from how you use the package, so potentially more easy to contribute?

I really like this structure. Very organized. Yeah only concern is nesting. We are potentially dealing with from hssm.likelihoods.approx_differentiable.lan import .... Too much typing for me 🤣

Another idea is we can keep this folder structure but potentially export some functionalities at the top level. Kind of like pm.Distribution.

digicosmos86 commented 1 year ago

Another observation is that everything under lan actually does not provide any likelihood functions. They are more like helper functions to create distributions (kind of like dist.py), so we can probably move lan under dist_utils, while flatten the likelihoods folder. We can have one analytical.py under that folder for analytical likelihoods. Like this:

src/hssm
├── __init__.py
├── hssm.py
├── config.py
├── utils.py
└── likelihoods
    ├── __init__.py
    ├── analytical.py
└── dist_utils
    ├── __init__.py
    ├── dist.py
    └── lan ## take out of the wfpt folder (could potentially be renamed to lan_utils?)
        ├── lan.py
        ├── onnx2pt.py
        └── onnx2xla.py
digicosmos86 commented 1 year ago

Maybe another angle is to check the structure of the APIs that we expose. With the above package structure, we have

HSSM
- dist_utils
   - make_model_rv
   - make_distribution
   - make_lan_distribution
   - make_blackbox_op
   - lan
     - make_jax_logp_funcs_from_onnx
     - make_jax_logp_ops
     - make_pytensor_op
- likelihoods
   - logp_pdf
   - logp_pdf_sv
   - WFPT
   - WFPT_SDV

This is basically the structure of our API. I don't think we need to expose anything else.

digicosmos86 commented 1 year ago

@AlexanderFengler @AisOmar can we reach a solution today? It seems that my part of documentation relies on the make functions very heavily so it'd be great if I get started after the API is changed

AisOmar commented 1 year ago

dist_utils -> distribution_utils