Segregation Inference in the Granular Setting

fengxiaoruo commented 1 year ago

We are currently working on segregation inference in the granular setting, i.e., a city can contain thousands of units/grids in our data, by using PySAL segregation module.

Some complications arise when we apply the inference function in our granular setting. That is, the total number of people within a grid can be relatively small, and some grids may therefore fail to include certain types of groups when we simulate data. As a result, we may obtain an invalid simulated sample and fail to perform the inference.

The Error is “unsupported operand types(s) for +: ‘float’ and ‘’NoneType”. Checking for the simulated data, I found it was due to the 0s generated during iterations.

I currently solved this problem by adding a ‘try’ module to update iteration until it is valid. However, this method is too time-consuming (even hard to get a result) when doing inferences.

So may I ask what kind of solutions can be used in our granular setting and what would be the trade-offs? The data sample and the code is here.

knaaptime commented 1 year ago

thanks for raising this @fengxiaoruo!

The granular setting you describe is the typical use-case for the segregation package. Most often we have many observations in small spatial units, and those are used to summarize a larger region of interest. In the package examples, we often use a couple thousand observations (census tracts) to examine a metropolitan reigon in the U.S.--so what you're describing should just work without any modifications :)

tl;dr, the issue you're running into is caused by the modified dissimilarity index, which works a bit differently than others in the package, because it draws from a binomial distribution internally.

The issue here that the values in the group_population are being updated in each iteration instead of overwritten. A fix is here and will be included in the next release.

some more detailed notes in the notebook here https://gist.github.com/knaaptime/325115c493557725ef241b44d5c5c0a4

fengxiaoruo commented 1 year ago

Thanks for your replies and modification of the code.

I applied the updated code to a large sample of data, using "segregation.batch import batch_compute_singlegroup" for calculation, and the warning is

"Terminating: Nested parallel kernel launch detected, the workqueue threading layer does not supported nested parallelism. Try the TBB threading layer. / Library/Frameworks/Python framework Versions / 3.10 / lib/python3.10 / multiprocessing/resource_tracker py: 224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ') "

Is the multithreading settings in the segregation functions inconsistent with the setting of my computer? And do you know how to solve this problem?

knaaptime commented 1 year ago

2 quick things:

can you try computing the index using backend="loky" (instead of the default "threading") ?
can you try computing the index by itself (i.e. not using batch_compute?)

i think this is caused by nesting threaded loops. If we change one of the backengs to loky instead of threading, i think it should skirt the issue

pysal / segregation

Segregation Inference in the Granular Setting #206