pysal / segregation

Segregation Measurement, Inferential Statistics, and Decomposition Analysis
https://pysal.org/segregation/
BSD 3-Clause "New" or "Revised" License
112 stars 26 forks source link

Segregation measure (aspatial/spatial) with groupby? #168

Closed federicoricca closed 3 years ago

federicoricca commented 3 years ago

More of a question, really. I was wondering what is the best way to compute a number of segregation measures at city level, for multiple cities. Consider a dataset of tract-level observations for, say, 10 cities. I gave it a try with a groupby('city') aggregation or application, but the segregation functions do not seem to work on groupby object. Of course, it is always possible to loop over cities, and that would do it.

I was wondering if it is possible to make it work with groupby, and if it even makes sense/is more efficient that way compared to a loop.

Thank you for any help!

knaaptime commented 3 years ago

Hi @federicoricca

thanks for the question. A few things: the API will be changing shortly with #161 that will do away with the spatial/aspatial distinction, so just a heads up. You can accomplish what you're after by passing a segregation class to the apply method of a groupby, like

tracts.groupby('state').apply(lambda x: Dissim(x.dropna(), group_pop_var='n_nonhisp_black_persons', total_pop_var='n_total_pop').statistic)

which for this dataset gives me

state
10    0.342291
11    0.537680
dtype: float64

with that said though, a few of our vectorized implementations can be memory intensive for large datasets (and can obviously crash your kernel if the memory blows up), so in some cases you're probably better off with a for loop anyway

knaaptime commented 3 years ago

i'll mark this as resolved since your email indicated this approach worked for you

federicoricca commented 3 years ago

Thank you! That was very helpful