pysal / spreg

Spatial econometric regression in Python
https://pysal.org/spreg/
Other
66 stars 23 forks source link

Add Wilkinson formula interface and scikit-learn style estimators #103

Closed tdhoffman closed 11 months ago

tdhoffman commented 1 year ago

This pull request supercedes #102. It adds the following features:

Pursuant to this comment on #101, everything in this PR is fully backwards compatible. These are add-ons to the package which sit on top of existing code and provide alternative ways for users to interface with the core functionality of spreg. To ensure the changes are not breaking, the spreg.sklearn submodule must be directly loaded by users (e.g. via import spreg.sklearn) as it will not be automatically imported otherwise.

Currently, the formula interface function dispatches to all of the spatial regression models. While it does not support regime regression or seemingly unrelated regression, it provides a solid base of code to which these features can be added.

More information about the design process for these features can be found in my GSoC 2022 Progress Journal, and of course I'm more than happy to answer any questions and make any changes!

TaylorOshan commented 10 months ago

@jGaboardi just curious, was there any off-line discussion about this before closing? IIRC, this was a pretty good effort towards adding the formula capabilities that we have talked about for a while.

lanselin commented 10 months ago

I'm not a fan of the Wilkinson formula. For any but the simplest models it creates all kinds of issues, e.g., how to introduce WX variables without having to explicitly list them, etc. For example, compare to the in my opinion overly complicated interface in spatialreg in R to accommodate all the situations related to spatial Durbin, endogenous variables, instruments, etc. Btw, instruments are never part of an actual formula but added on by means of these (again, in my opinion) very awkward | | options. While not ideal, I very much like the most recent setup (1.4) which allows for specification of a range of models such as SLX, spatial Durbin, SLX error and even GNS, using GMM_Error and slx_lags and add_wy arguments. The WX and Wy variables never need to be listed explicitly but are computed (and variable names created) under the hood. The fewer variable names need to be passed, the fewer opportunities for typos. My golden rule in life :-)

I think our efforts are better spent at enhancing functionality in supporting different models and estimation methods. I was never a fan of this in the first place, not sure why it even became a GOSC project ...

jGaboardi commented 10 months ago

@TaylorOshan I did not close this... Perhaps I screwed something up with the pysal:master -> pysal:main switch. All open PRs were supposed to be automatically updated, but it looks like something funky happened...

jGaboardi commented 10 months ago

@TaylorOshan I very much apologize for the screw up. I have never seen this happen with master->main switch. Good news is that @tdhoffman's branch still has all the work. Shall we create a new PR from that?

lanselin commented 10 months ago

I say no. Neither Pedro nor I like this approach. As long as I am bdfl for spreg, I nix it. The sklearn interface is built for prediction, spreg is mostly about inference, so it actually turns out to be very awkward.

lanselin commented 10 months ago

Just to make clear, I would like it to be closed. We are done with that.

pedrovma commented 10 months ago

The many additional combinations of specifications introduced by v1.4 make this outdated. As @lanselin pointed out, these formulas can't address them all easily, especially when combined with regimes and/or seemingly unrelated regressions. I believe the suggestion made by @martinfleis in #127 is a more interesting approach to simplify the function calls given the changes introduced by v1.4. Regardless, adding other functionalities we have already mapped, such as additional tests and the computation of direct/indirect effects, seems to me to be more of a priority now than enhancing the arguments' structure.

TaylorOshan commented 10 months ago

Understood @lanselin. Thanks for the additional feedback @pedrovma. My intention was not to advocate for this in spreg. There were several conversations during the GSOC period about formulas more generally and in the context of other modules and I was curious if this would be worth preserving as a proof-of-concept in case it is useful elsewhere in the future. Thanks to @jGaboardi for pointing out that the original branch is in tact and takes care of that.