pysal / segregation

Segregation Measurement, Inferential Statistics, and Decomposition Analysis
https://pysal.org/segregation/
BSD 3-Clause "New" or "Revised" License
111 stars 26 forks source link

2.0 refactor #161

Closed knaaptime closed 3 years ago

knaaptime commented 3 years ago

this starts the restructuring discussed in #4. i think it pretty substantially cuts down on repetition and makes the code much easier to navigate. We also get #104 for free

renanxcortes commented 3 years ago

This looks good to me and I am +1 for merging it, however, I'm assuming that this is a WIP, since just the single group measures were refactored, right?

knaaptime commented 3 years ago

sweet. yeah, still WIP, just wanted to get some feedback/buy-in for this approach. I'll keep going with the multigroup indices

knaaptime commented 3 years ago

this is ready to go. The main benefit is we adopt the logic of Reardon & Osullivan, that we can derive generalized spatial indices by transforming the input data of aspatial indices through a weights matrix (so, e.g. now we can compute multiscalar profiles for any SpatialImplicit index).

the original tests are unchanged to show that the (aspatial) calculations are still correct, with one exception: MinMaxS has a very different estimate. I think its correct, but want to flag this one did change

aside from the reorganization, there's one important API change, which is that fitted indices have the attribute data now instead of core_data

codecov-commenter commented 3 years ago

Codecov Report

Merging #161 (fea1117) into master (736098b) will decrease coverage by 19.97%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #161       +/-   ##
===========================================
- Coverage   85.75%   65.77%   -19.98%     
===========================================
  Files          65      116       +51     
  Lines        2779     4322     +1543     
===========================================
+ Hits         2383     2843      +460     
- Misses        396     1479     +1083     
Impacted Files Coverage Δ
...egregation/aspatial/multigroup_aspatial_indexes.py 31.44% <0.00%> (-68.56%) :arrow_down:
segregation/aspatial/aspatial_indexes.py 17.03% <0.00%> (-66.53%) :arrow_down:
segregation/network/network.py 27.27% <0.00%> (-65.91%) :arrow_down:
segregation/spatial/spatial_indexes.py 15.07% <0.00%> (-57.32%) :arrow_down:
segregation/util/util.py 78.72% <0.00%> (-6.72%) :arrow_down:
segregation/util/__init__.py 100.00% <0.00%> (ø)
segregation/local/__init__.py 100.00% <0.00%> (ø)
segregation/aspatial/__init__.py 100.00% <0.00%> (ø)
segregation/decomposition/decompose_segregation.py 97.53% <0.00%> (ø)
...regation/tests/test_local_multi_local_diversity.py 92.85% <0.00%> (ø)
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 736098b...fea1117. Read the comment docs.

knaaptime commented 3 years ago

(tests are all passing, there's a CI issue with rtree on windows)

renanxcortes commented 3 years ago

Awesome, @knaaptime! By the way, when you say "MinMaxS has a very different estimate", you mean the "MinMax" from https://github.com/knaaptime/segregation/blob/2.0/segregation/singlegroup/minmax.py, right? Any clue of why this might be happening? 'Cause it seems like nothing changed in the code/math behind it, right?

knaaptime commented 3 years ago

yeah, the difference between old and new. I thought it had to do with different values for the spatial params, but now im not too sure. The old number looks awfully low relative to the values from the paper

knaaptime commented 3 years ago

right, the math and the function that actually compute the statistic havent changed. I thnk the root is this line in the new version, whereas the old code probably created a Kernel weights object from unprojected data

renanxcortes commented 3 years ago

right, the math and the function that actually compute the statistic havent changed. I thnk the root is this line in the new version, whereas the old code probably created a Kernel weights object from unprojected data

Yep, this might it, perhaps the new version of estimate_utm_crs projects that gdp differently than the original test. I guess we could just update the test to the new value (even tough it is largely different than the previous one) and go for the merge.

knaaptime commented 3 years ago

aside from a few minor aesthetics (some docstring idiosyncrasies, classes not alphabetized in the API docs) this is complete so would be good to get a final round of review :) @renanxcortes @sjsrey

jGaboardi commented 3 years ago

@knaaptime the mac builds took forever to get started, which lead to the ~30-minute run time in a960d2d. Without the delayed start we are looking at roughly 18-minute run time (Windows/Py39) using micromamba.

knaaptime commented 3 years ago

awesome, @jGaboardi. thank you!

knaaptime commented 3 years ago

i'm open to suggestions. Since it's such a major restructuring, I'd opted to leave them out at the moment since this is a MAJOR version increment implying incompatible API changes, so it's relatively straightforward to say stay on the 1.x series if you need the old api

on the other hand, we havent really given much warning that 2.x is coming and people may be confused. The api for any given index is consistent across versions, but the structure and modules are different between the two. Also instead of e.g. SpatialMinMax and MinMax, we only have the latter, which accepts spatial params. You think we should alias all of that into the old namespace with deprecation warnings for a few releases?

sjsrey commented 3 years ago

I think we could have a lightweight deprecation layer that mimics the old API and keep it around for a few releases. Once were done we just delete that layer and be done with it. I'll draft up a PR later today with some ideas along this line

sjsrey commented 3 years ago

In 2.0, the first notebook is raising an error for me:

image

This is after a pip install -e .

Do we need to pin to a particular version of geopandas or pygeos?

knaaptime commented 3 years ago

not that i know of?

knaaptime commented 3 years ago

currently we're not actually even using it

sjsrey commented 3 years ago

not that i know of?

import geopandas throws that error?

sjsrey commented 3 years ago
%load_ext watermark
%watermark -a 'eli knaap' -v -d -u -p segregation,geopandas,libpysal,pandana
knaaptime commented 3 years ago

maybe try removing pygeos completely

knaaptime commented 3 years ago

@martinfleis any idea why we're hitting the error above?

sjsrey commented 3 years ago

maybe try removing pygeos completely

looks like geopandas is 0.8.1 in that env

martinfleis commented 3 years ago

I'm just on the phone atm but geopandas 0.8.1 doesn't play with pygeos 0.9. Try to get gpd 0.9 or at least 0.8.2 to the env.

sjsrey commented 3 years ago

mamba update -c conda-forge geopandas

puts gp at 0.9.0 and problem with MISSING is solved.

knaaptime commented 3 years ago

thanks martin