GSoC 2022 Interfaces for Consistent API Design

pysal / esda

statistics and classes for exploratory spatial data analysis

https://pysal.org/esda

BSD 3-Clause "New" or "Revised" License

218 stars 57 forks source link

GSoC 2022 Interfaces for Consistent API Design #212

Open tdhoffman opened 2 years ago

tdhoffman commented 2 years ago

For GSoC 2022, I'm working on designing more consistent interfaces to PySAL's exploratory and inferential statistics classes. My mentors and I are exploring what might need to be done to

render confederated packages compatible with the scikit-learn paradigm, and
develop R-style Wilkinson formulas for modeling classes.

To these ends, we're interested in getting feedback on the desirability and feasibility of these changes from package leads and devs.

Do you think the scikit-learn model would work well for this package? Why or why not?
Do you see any sources of potential friction between the existing codebase and the scikit-learn model?

Excited to hear your input!

martinfleis commented 2 years ago

I do support switching to the sklearn style but I am curious how do you envisage this is going to happen. Let's take esda.Moran as an example. Right now, we fit on initialisation of the class, which expects the data and the arguments at that time.

esda.Moran(y, w, transformation='r', permutations=999, two_tailed=True)

The first question is what is the signature of Moran and its fit method after the change, esp. where does w goes? Does it stay in init, as for example connectivity is in sklearn.cluster.AgglomerativeClustering? Or does it go to fit with y?

And the main one is - how do we do the transition? We cannot just switch as it would break stuff and I am not certain what is the ideal deprecation mode here. Do you have an idea about that?

knaaptime commented 2 years ago

personally, the only change i'd like to see over here is the adoption of pep8 (i.e. get rid of those damn underscores in the classes :P).

as I said over in spreg, i'm sure im missing something about the utility of that pattern, but i cant see why adopting a scikit-like signature in esda's classes would be preferable... what benefit would that provide over the current API? i dont have a strong opinion but i think im missing the value proposition