pysal / esda

statistics and classes for exploratory spatial data analysis
https://pysal.org/esda
BSD 3-Clause "New" or "Revised" License
215 stars 55 forks source link

Bug fixes across Lee and swap to survival function for #243 #245

Closed ljwolf closed 1 year ago

ljwolf commented 1 year ago

This addresses a few bugs filed on the Lee statistic, as well as addressing the precision concerns raised in #243 with using 1-stats.norm.cdf().

ljwolf commented 1 year ago

Once merged, it would be good to make a bug fix release addressing this, given the lee standardisation issue can have an impact for odd weights specifications.

codecov[bot] commented 1 year ago

Codecov Report

Merging #245 (fe407b2) into main (165e139) will increase coverage by 1.5%. The diff coverage is 72.7%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #245     +/-   ##
=======================================
+ Coverage   71.5%   73.0%   +1.5%     
=======================================
  Files         24      24             
  Lines       3246    3246             
  Branches     519     519             
=======================================
+ Hits        2320    2369     +49     
+ Misses       763     709     -54     
- Partials     163     168      +5     
Impacted Files Coverage Δ
esda/lee.py 72.8% <ø> (+53.3%) :arrow_up:
esda/geary.py 92.5% <33.3%> (ø)
esda/moran.py 74.5% <83.3%> (ø)
esda/getisord.py 66.4% <100.0%> (ø)
ljwolf commented 1 year ago

test failures arise in #244, so they are unrelated to these changes.

ljwolf commented 1 year ago

So the testing failures in the join counts is, I think, due to the changes upstream in libpysal having to do with adjacency lists.

ljwolf commented 1 year ago

OK, my understanding is the following.

With pysal/libpysal@main, the following is broken:

from libpysal.weights.util import lat2W

w = lat2W(3,3)
w.neighbors # this is correct
{0: [3, 1],
 3: [0, 6, 4],
 1: [0, 4, 2],
 4: [1, 3, 7, 5],
 2: [1, 5],
 5: [2, 4, 8],
 6: [3, 7],
 7: [4, 6, 8],
 8: [5, 7]}
w.to_adjlist().head() # this is not
   focal  neighbor  weight
0      0         3     1.0
1      0         4     1.0
2      3         0     1.0
3      3         1     1.0
4      3         2     1.0

The issue arises because, at line 440 of weights.py, we use self.neigbors.keys(). Since ids are sorted by default and dicts now retain their insertion order, the two are not the same:

w.id_order
[0, 1, 2, 3, 4, 5, 6, 7, 8]
w.neighbors.keys()
dict_keys([0, 3, 1, 4, 2, 5, 6, 7, 8])

The ordering that is needed is w.id_order, which matches the order in the sparse array we use for the edge tuples. With this libpysal fix, the test failures disappear.

sjsrey commented 1 year ago

Looks good!