Open barguzin opened 3 years ago
Thanks for the report!
Yeah, I can confirm the behaviour. It is not an issue with block_weights
but with remap_ids
called under the hood to map ids=sam1.index
onto weights. This loop becomes dead slow with a case like this when there's a lot of neighbours per each observation.
The script eventually finishes, it is not an infinite loop. But it is super slow.
In this specific case, I'd try to avoid passing ids=sam1.index
and rely on positional indexing.
Platform information: posix linux posix.uname_result(sysname='Linux', nodename='barguzin-tp', release='5.11.0-34-generic', version='#36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021', machine='x86_64')
Python version: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)
SciPy version: 1.7.0
NumPy version: 1.20.2
I am working with CA census tracts, but any shapefile with >2000 rows would suffice. The problem is with libpysal.weight.util.block_weights function, which runs infinitely after a certain threshold of dataframe size is met. I tried running the function on my laptop at first (see specs above). My first guess was that it could be related to the amount of RAM or CPU usage, so I tried running it on the server (256GB RAM, 32 CPU cores, CentOS). In both cases the computer seems to go into this infinite computation loop at the size of 1500-1600 rows.
Here is the reproducible code demo: