pysal / segregation

Segregation Measurement, Inferential Statistics, and Decomposition Analysis
https://pysal.org/segregation/
BSD 3-Clause "New" or "Revised" License
111 stars 26 forks source link

Memory efficient Gini #170

Closed sjsrey closed 2 years ago

sjsrey commented 3 years ago

Key changes are in line 370.

This gets around the memory bound and is competitive time-wise to the original vectorized implementation.

codecov-commenter commented 3 years ago

Codecov Report

Merging #170 (5756fd2) into master (bdf53f5) will decrease coverage by 11.60%. The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #170       +/-   ##
===========================================
- Coverage   85.75%   74.14%   -11.61%     
===========================================
  Files          65       67        +2     
  Lines        2779     3226      +447     
===========================================
+ Hits         2383     2392        +9     
- Misses        396      834      +438     
Impacted Files Coverage Δ
segregation/aspatial/flycheck_aspatial_indexes.py 0.00% <0.00%> (ø)
segregation/__init__.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bdf53f5...5756fd2. Read the comment docs.

knaaptime commented 3 years ago

would it be best to do something similar here e.g. by iterating over pygeos objects and calculating ij distance iteratively? I've been digging into https://github.com/pysal/segregation/issues/169, and while projections are definitely one source of error, im also able to reproduce the error raised in the bottom of that discussion and im still not sure whats going on but think the distance calculation may be involved

knaaptime commented 3 years ago

do we have a sense for the performance measures for this change? We make sure not to run out of memory this way, but does it slow down computation or no? I imagine iterating over pygeos objects should be pretty fast, but if there's a difference in timing between the two approaches, maybe we want to include an argument allowing user to decide which to use?

sjsrey commented 2 years ago

This has gone a bit stale so we will be closing this. In its place, I can open up another PR to address the memory-bound issue. I have a gist that explores some of the alternative implementations that address the memory/speed tradeoffs @knaaptime raises previously in this discussions.

sjsrey commented 2 years ago

Superseded by #195.