pysal / libpysal

Core components of Python Spatial Analysis Library
http://pysal.org/libpysal
Other
249 stars 79 forks source link

REF: refactor Graph.to_W to avoid perf bottleneck #691

Closed martinfleis closed 4 months ago

martinfleis commented 4 months ago

Closes #672

For adjacency of 1,404,080 edges, the conversion time is 42.7s on main and 4.5s in this PR.

@jGaboardi not sure how to properly label this...

jGaboardi commented 4 months ago

I'd say this is an "enhancement" due to the performance improvement.

martinfleis commented 4 months ago

Thinking about this more, I think that we shall use the same logic in weights and neighbors properties that are unbearably slow for large graphs. Hard to catch these things when developing on small graphs...

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 85.0%. Comparing base (018f1e2) to head (bcabdbc). Report is 4 commits behind head on main.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/pysal/libpysal/pull/691/graphs/tree.svg?width=650&height=150&src=pr&token=wgnkG5Rj0J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal)](https://app.codecov.io/gh/pysal/libpysal/pull/691?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal) ```diff @@ Coverage Diff @@ ## main #691 +/- ## ======================================= + Coverage 84.7% 85.0% +0.3% ======================================= Files 141 141 Lines 15200 15203 +3 ======================================= + Hits 12880 12924 +44 + Misses 2320 2279 -41 ``` | [Files](https://app.codecov.io/gh/pysal/libpysal/pull/691?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal) | Coverage Δ | | |---|---|---| | [libpysal/graph/base.py](https://app.codecov.io/gh/pysal/libpysal/pull/691?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal#diff-bGlicHlzYWwvZ3JhcGgvYmFzZS5weQ==) | `97.9% <100.0%> (+<0.1%)` | :arrow_up: | ... and [11 files with indirect coverage changes](https://app.codecov.io/gh/pysal/libpysal/pull/691/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pysal)
knaaptime commented 4 months ago

i got a question about the performance of the weights builders in the workshop yesterday, and i had to proudly scoff that they're the fastest you can find... 'Martin uses these routinely on datasets in the millions' :D

jGaboardi commented 4 months ago

Do we know what's causing the failures on macOS & ubuntu-dev?

knaaptime commented 4 months ago

the mac test looks like its probably a fluke. the ubuntu stuff is all over the place

martinfleis commented 4 months ago

Builders are fast but some compatibility bits seem to be worse :). I'm noticing it only now when I regularly use Graph on large data. There's a reason it is still experimental :)

ljwolf commented 4 months ago

This looks fine to me! I think we may want to consider some benchmarks with asv for these kinds of things... I think that construction, serialization, conversion, lag, and standardization are the big targets?

jGaboardi commented 4 months ago

Looks like all failures will be fixed by #692. Going ahead with that merge now and see if we can get this green.

jGaboardi commented 4 months ago

All green (for now --> the macOS failure just before had to do with a connection issue for geodatasets).

Not sure if we wanted to see about implementing @ljwolf's idea for asv benchmarking here, or that's something from the future. Thinking probably it's future, but I'll wait on merging until confirmed.

martinfleis commented 4 months ago

Not going to do asv here. I'm generally a bit skeptic about it given we have it in geopandas and no one is running it or anything.