Closed knaaptime closed 2 years ago
Merging #192 (4f03ea5) into master (deae25c) will decrease coverage by
0.07%
. The diff coverage is71.69%
.
@@ Coverage Diff @@
## master #192 +/- ##
==========================================
- Coverage 62.43% 62.35% -0.08%
==========================================
Files 119 123 +4
Lines 4784 4928 +144
==========================================
+ Hits 2987 3073 +86
- Misses 1797 1855 +58
Impacted Files | Coverage Δ | |
---|---|---|
segregation/batch/batch_compute.py | 90.36% <ø> (ø) |
|
segregation/dynamics/__init__.py | 100.00% <ø> (ø) |
|
segregation/local/local_multi_diversity.py | 95.23% <0.00%> (ø) |
|
segregation/singlegroup/dissim.py | 96.29% <ø> (ø) |
|
segregation/singlegroup/modified_dissim.py | 67.27% <26.31%> (-23.64%) |
:arrow_down: |
segregation/singlegroup/modified_gini.py | 69.23% <26.31%> (-25.90%) |
:arrow_down: |
segregation/network/network.py | 64.93% <60.71%> (+39.35%) |
:arrow_up: |
segregation/dynamics/divergence_profile.py | 82.35% <82.35%> (ø) |
|
segregation/_base.py | 82.47% <85.71%> (ø) |
|
segregation/local/local_distortion.py | 94.73% <94.73%> (ø) |
|
... and 14 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update deae25c...4f03ea5. Read the comment docs.
we will probably need to think through the terminology a bit:
LocalDistortionCoefficient
following the nomenclature they use in this paper. dynamics
moduleGlobalDistortionIndex
from this paper should go in the multigroup module based on this formula
I think the _kl_divergence_profile
@noahbouchier wrote gets us 90% of the way to each of those measures, its just a matter of how we summarize and reshape the output for these different measures
Would it be cool to keep a more scikit-oriented API with the metric
argument? this was the intent behind specifying the metric argument as we did in GSoC. We can extend the scikit-style metric
argument to include a special "network" option, too, and treat it as a special case as they already do metric='precomputed'
...
Let me know what you think! Happy to implement.
aside from the change to allow precomputed distance metrices, I've finished all the pieces i described above. We've now got
dynamics.compute_divergence_profiles
multigroup.GlobalDistortion
local.LocalDistortion
The important remaining work to do is to write a function that simulates the "maximum possible segregation" given the study data. Until we have that, the GlobalDistortion index isn't scaled, so doesnt mean much
@jGaboardi if you got a spare minute, any idea what's going on with these windows tests? They seem to be stalling like a quarter of the way through with no apparent reason?
It may be a similar issue to what's going on in spaghetti
--> pysal/spaghetti#666. (But also, maybe not...)
ok, so after some more thought i think the only remaining question here is how we standardize the api. A "standard" segregation profile works by increasing the threshold in a distance-based weights object (regardless of how many neighbors), then applying distance decay and recomputing the segregation statistic. The KL profile works by increasing the K in a KNN weights object (regardless of distance) then applying distance decay to the neighbors and recalculating the statistic. In both cases we're just expanding the spatial scope, so is it worth it to abstract out the shared logic that both functions can consume?
even if the two functions dont share code, it would be good to standardize the arguments. The current segregation profile already requires a distances
argument, which is a list of threshold distances passed to a libpysal.W (or a pandana.Network aggregation query). In the approach described above, distances
would be either the matrix of pairwise distances between observations if metric=precomputed
, or a pandana.Network if metric=network
(which would just create the distance matrix internally, so you'd also need to provide the pandana.Network object)
I think its confusing for one *profile function to have distance
as a list of thresholds and another that has distance
as a pairwise distance matrix, so what about:
compute_segregation_profile
:
distances
to distance_thresholds
metric
argument (it can just pass it through to the W)compute_divergence_profiles
distance_matrix
argument, which can can be either a numpy array (implying metric=precomputed
or a pandana.Network (implying metric=network_lcp
or something)compute_divergence_profiles
, you always step through all neighbors, so there's no translation of *thresholdsDoes this close #142 or is there more to do there?
i'd say this resolves #142, though we do still need to implement the scaling factor computation
this PR adds the enhancements I requested over in #185. Thanks a ton for putting this together @noahbouchier!!
Aside from the formatting, there is one substantive change: instead of returning the matrix of Origin: [statistic, distance], it returns a dataframe of the KL divergence coefficients (the sum of KL profile values for each unit)
the data arent exactly the same, so no thorough testing just yet, but a quick look suggests we've (left) got the same visual results as the paper (right). One thing to note here, we will want to add the option to normalize score values as they do in the original paper. Shouldnt be too hard but it will require us to write another function to determine the scaling factor