pysal / libpysal

Core components of Python Spatial Analysis Library
http://pysal.org/libpysal
Other
249 stars 79 forks source link

ENH: include Graph.describe() to describe neighbourhood values #717

Closed u3ks closed 1 month ago

u3ks commented 1 month ago

This PR adds a method to the graph api which takes an array of values and calculates descriptive statistics within each neighborhood. Optionally, some neighbors can be filtered out based on the percentiles of the passed values. The supported stats are - "count", "mean", "median", "std", "min", "max", "sum", "nunique" and "mode".

The method similar to .apply, but all values are calculated in one grouping operation and all functions are jitted.

martinfleis commented 1 month ago

Just to add some context to this. As we are refactoring momepy, we realised that we rely very often on this internal function, which is fairly generic and shall be tied directly to Graph.

The idea behind the q limiting the range is coming from morphology. We often want to get some sort of a spatial average but given the high likelihood of outliers (think of a church in the middle of a neighborhood), we can't include all the values within each neighborhood.

ljwolf commented 1 month ago

I think, for generality, this should be called a truncated or trimmed reduction/lag?

This is very useful generally... @weikang9009 and I have been working on related concepts recently, so it'd be very nice to have something core here!

martinfleis commented 1 month ago

I think, for generality, this should be called a truncated or trimmed reduction/lag?

Only if q is not None. Otherwise it is just a generic lag. I am also not sure what can be called a lag (nunique?). The describe terminology comes from pandas. It felt close enough to what we're doing here.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.

Project coverage is 85.1%. Comparing base (bcabdbc) to head (879f3f5). Report is 18 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/pysal/libpysal/pull/717/graphs/tree.svg?width=650&height=150&src=pr&token=wgnkG5Rj0J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal)](https://app.codecov.io/gh/pysal/libpysal/pull/717?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal) ```diff @@ Coverage Diff @@ ## main #717 +/- ## ====================================== Coverage 85.0% 85.1% ====================================== Files 141 145 +4 Lines 15203 15483 +280 ====================================== + Hits 12924 13169 +245 - Misses 2279 2314 +35 ``` | [Files](https://app.codecov.io/gh/pysal/libpysal/pull/717?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal) | Coverage Δ | | |---|---|---| | [libpysal/graph/tests/test\_base.py](https://app.codecov.io/gh/pysal/libpysal/pull/717?src=pr&el=tree&filepath=libpysal%2Fgraph%2Ftests%2Ftest_base.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal#diff-bGlicHlzYWwvZ3JhcGgvdGVzdHMvdGVzdF9iYXNlLnB5) | `100.0% <100.0%> (ø)` | | | [libpysal/graph/\_utils.py](https://app.codecov.io/gh/pysal/libpysal/pull/717?src=pr&el=tree&filepath=libpysal%2Fgraph%2F_utils.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal#diff-bGlicHlzYWwvZ3JhcGgvX3V0aWxzLnB5) | `97.1% <97.6%> (+2.2%)` | :arrow_up: | | [libpysal/graph/base.py](https://app.codecov.io/gh/pysal/libpysal/pull/717?src=pr&el=tree&filepath=libpysal%2Fgraph%2Fbase.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal#diff-bGlicHlzYWwvZ3JhcGgvYmFzZS5weQ==) | `96.8% <92.9%> (-1.1%)` | :arrow_down: | ... and [6 files with indirect coverage changes](https://app.codecov.io/gh/pysal/libpysal/pull/717/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal)
knaaptime commented 1 month ago

i think i would call this describe_cardinalities or something because "Graph.describe() to describe neighbourhood values" implies we're looking at the neighbor values

martinfleis commented 1 month ago

But this is not describing cardinalities, no? Where cardinality is a number of elements in a set. It is describing distribution of values within a neighbourhood.

knaaptime commented 1 month ago

oh i see. It was this note on line 2014 that tripped me up:

'Weight values do not affect the calculations, only adjacency does.'