wptmdoorn / methcomp

Python package providing functionality and plotting for chemistry method comparison
MIT License
14 stars 9 forks source link

API revision - separating calculation from plotting functionality #5

Closed wptmdoorn closed 3 years ago

wptmdoorn commented 3 years ago

This issue follows issue #1 to have a broader discussion about how to extend the API to not only support plotting but also support calculation for each of our functions (regression, Bland-Altmans and surveillance error grids). I just committed a first example in a separate branch in https://github.com/wptmdoorn/methcomp/commit/7ff14e0026d794696c2d22fb8f861df4240c593f. This code is very preliminary and is most likely suboptimal but is just here to provide with some food for thought. As exemplified in examples/blandaltman.py it works like this:

  1. User instantiates _BlandAltman object through blandaltman() function
  2. _BlandAltman init will check params and derive params. It saves this into an output dict
  3. User can access test-statistics through statistics()
  4. User can access plot (with additional arguments) through plot()

Major benefits is that we also separate the arguments needed for calculations (e.g. the data, confidence intervals, and so on) from the actual markup of the plot (e.g. color points, graph title). Very happy to hear feedback!

thomasburgess commented 3 years ago

For Regression, it would make sense if the calculation function return its output: i.e.

slope, intercept = Regression(method1, method2).calculate()

rather than

r = Regression(method1, method2)
slope = r.stastistcs["slope"]
intercept= r.stastistcs["slope"]
wptmdoorn commented 3 years ago

@thomasburgess

Could you look at my commit of today and tell me if you approve with this structure of the API? If we find a consensus here, I will finalize the BlandAltman class and merge it into the master. From then on, we can start modifying the other functionality of the API (e.g. Regression and Surveillance Error Grids) to match the new structure.

The general idea (as we discussed on mail before) is:

  1. Initialize the class (e.g. BlandAltman or Passing-Bablok) with the arguments that are needed for calculations (e.g. method1, method2, CI)
  2. Compute the statistics/parameters of the functionality using the .compute() function. I would prefer to store this in a dictionary (regardless of the class), as this provides the most consistent API. Also, the results of the function are stored internally in _output variable (see Line 52-58 in the revised BlandAltman.py).
  3. If the user also wants to plot the results, this can be done through the .plot() function. Importantly if the user computed the statistics before, it will use this, otherwise it will first call .compute() internally to generate the statistics (see L82-L83 in the revised BlandAltman.py).

Would you be able to look at this and share your thoughts? I really appreciate it, thank you!

thomasburgess commented 3 years ago

The general ideas are sound

  1. Yes for calculations only.
  2. As long as compute() also returns its result this will work fine. Returning the value allows you to init, calculate and use result in a list comprehension or DataFrame apply
  3. Agreed

I would propose to keep the main functions that exist in regression.py, but get rid of the classes. Then call Regressor classes from regression.py. This way Regressor is usable on its own, all while not breaking backwards compatibility.

thomasburgess commented 3 years ago

Make a pull request for BlandAltman, and I can give you line by line review of the code. I will update the regressor class to be closer to this idea. It could be that we can use the same base class for BlandAltman and Regressor, but that can be done once the updated BlandAltman has been merged.

wptmdoorn commented 3 years ago

I will create a pull-request now for the separation branch, thanks a lot @thomasburgess

thomasburgess commented 3 years ago

Merged, so issue closed