spelling - Githubissues

renanxcortes commented 5 years ago

Hi @AnttiHaerkoenen, thank you for the PR, this sounds promising! I'd like to highlight two major points:

Currently SurfaceSpatialDissim uses a different API for data input. Usually we consider group_pop_var and total_pop_var. Is there a way to tweak the function for this API?
However, if you'd like to add a multigroup API (which we use in some indexes) SurfaceSpatialDissim needs to be changed as well. Although groups is a list of variable names, the code is hardcoded to be two groups. This should be allowed to have an arbitrary number of groups.

Cheers! Renan

knaaptime commented 5 years ago

thanks for this @AnttiHaerkoenen! It has forced us to reconsider some of our architecture decisions and make a few things more obvious.

So a few things:

1) the current SpatialDissim class is actually based on Morril and will be renamed to reflect that.

2) shortly, I'll add the generalized version of the spatial dissimilarity index given by Reardon & O'Sullivan that follows the same pattern as SpatialInformationTheory and SpatialDivergence. That pattern is intentionally generic to allow any parameterization of dp (i.e. any formalization of spatial relationships). So if you want a "surface-based" measure, you pass either a Kernel-based pysal weights object for a euclidian suface, or a Pandana network object for a network-constrained surface. If you wanted a topology-based measure, then you would pass a contiguity-based weights object--but the idea is to allow the same segregation statistic to be calculated using any concept of spatiality.

This follows the exact logic described by O`Sullivan & Wong on page 157 in that the data are first "spatialized" in step 3 before the S measure is calculated. We want to make sure that you have full control over how that spatialization takes place, without forcing a specific kernel density function, so this is the same logic we'd follow for adding the S measure. The index itself should be abstracted from the concept of spatial relationships so that any spatial relationship could be modeled with the same class rather than hardcoding the spatial relationship and repeating a lot of the same code for calculating the index in different classes.

knaaptime commented 5 years ago

Put slightly differently, following the Reardon & OSullican logic, we have a set of generalized spatial indices that we view as special cases of the aspatial indices. After passing the data through a spatial filter (i.e. a W or a Network) the indices take their spatial analogue (we're still moving to this framework).

So in an ideal case, we'd have this new S index implemented as an aspatial index that we can subclass as a spatial index where we include the W * data logic. We would also propose renaming the index to something like MinMaxS to avoid confusion since, as described above, any of our generalized spatial indices can be surface-based (and passing a kernel weights object to SpatialDissm will give you a surface-based spatial dissim)

again, thank you for submitting this PR because this has been a really useful exercise for the library's design

renanxcortes commented 5 years ago

Ok, so, I've been running locally the functions here line-by-line following the paper and have some comments that need to be addressed:

add class in __all__
move from typing import Sequence to be attached to the importing lines
Reading the paper more closely, I realized that this index actually makes sense for two groups. In that case the API should be changed to the usual group_pop_var and total_pop_var.
Also, renaming this class to MinMaxS as mentioned earlier, would be a better suit.

The subclass logic for "single group" segregation measure will be implemented in the future. But now this separate class could be merged as it is currently after these points raised being addressed.

ps.: density_1 and density_2 are created multiplying the kernel weights and the normalized values. I agree that this makes sense in this case, however can you point where in the Sullivan's paper this is explained? In other words, could you explain a little bit more this step?

And, once again, thank you very much @AnttiHaerkoenen!

AnttiHaerkoenen commented 5 years ago

Hi,

thanks for the feedback. I'll make new PR next week.

I had a problem with group vs total keywords because this measure is clearly meant for group vs group comparisons, but I'll change it back.

In the paper it is never stated explicitly how they calculated density surfaces, but I thought my way would be the most logical.

renanxcortes commented 5 years ago

Nice, thank you, @AnttiHerkonen! So, regarding the two groups API, I think what makes sense is the group_pop_var and total_pop_var and then internally you create a group_pop_var2 which is generated by the difference and your initial logic is kept.

Best, Renan

knaaptime commented 5 years ago

we also have _build_local_environment for performing the spatial smoothing

pysal / segregation

spelling #114