Closed fengxiaoruo closed 1 year ago
thanks for raising this @fengxiaoruo!
The granular setting you describe is the typical use-case for the segregation package. Most often we have many observations in small spatial units, and those are used to summarize a larger region of interest. In the package examples, we often use a couple thousand observations (census tracts) to examine a metropolitan reigon in the U.S.--so what you're describing should just work without any modifications :)
tl;dr, the issue you're running into is caused by the modified dissimilarity index, which works a bit differently than others in the package, because it draws from a binomial distribution internally.
The issue here that the values in the group_population are being updated in each iteration instead of overwritten. A fix is here and will be included in the next release.
some more detailed notes in the notebook here https://gist.github.com/knaaptime/325115c493557725ef241b44d5c5c0a4
Thanks for your replies and modification of the code.
I applied the updated code to a large sample of data, using "segregation.batch import batch_compute_singlegroup" for calculation, and the warning is
"Terminating: Nested parallel kernel launch detected, the workqueue threading layer does not supported nested parallelism. Try the TBB threading layer. / Library/Frameworks/Python framework Versions / 3.10 / lib/python3.10 / multiprocessing/resource_tracker py: 224: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ') "
Is the multithreading settings in the segregation functions inconsistent with the setting of my computer? And do you know how to solve this problem?
2 quick things:
backend="loky"
(instead of the default "threading") ?i think this is caused by nesting threaded loops. If we change one of the backengs to loky instead of threading, i think it should skirt the issue
We are currently working on segregation inference in the granular setting, i.e., a city can contain thousands of units/grids in our data, by using PySAL segregation module.
Some complications arise when we apply the inference function in our granular setting. That is, the total number of people within a grid can be relatively small, and some grids may therefore fail to include certain types of groups when we simulate data. As a result, we may obtain an invalid simulated sample and fail to perform the inference.
The Error is “unsupported operand types(s) for +: ‘float’ and ‘’NoneType”. Checking for the simulated data, I found it was due to the 0s generated during iterations.
I currently solved this problem by adding a ‘try’ module to update iteration until it is valid. However, this method is too time-consuming (even hard to get a result) when doing inferences.
So may I ask what kind of solutions can be used in our granular setting and what would be the trade-offs? The data sample and the code is here.