pysal / spopt

Spatial Optimization
https://pysal.org/spopt/
BSD 3-Clause "New" or "Revised" License
304 stars 46 forks source link

question on SKATER from Discord [2024-10-15] #463

Open jGaboardi opened 1 week ago

jGaboardi commented 1 week ago

Hi everyone, I discovered the documentation of spopt yesterday. An extremely interesting project!

It just so happens that one of the tutorials fits in perfectly with one of my business use cases, in particular this one: Spatial ‘K'luster Analysis by Tree Edge Removal: Clustering Airbnb Spots in Chicago.

In this demonstration, we attempt to cluster polygons characterised by a number of Airbnb spots into N clusters. Unless I was mistaken, I thought that the distribution of the objective variable (number of Airbnb spots) would be balanced for each cluster.

But that's not what we see when we add up the number of spots per cluster created. Could you clarify my understanding? And congratulations again to the whole team for the work they've done.

Many thanks in advance. Best regards

(if my question doesn't belong here, don't hesitate to delete it)

valentincorad commented 1 week ago

Hello, to clarify my issue. Here is my project. I have a geographical aera divided into multiple cities (for each city I have a polygon geometry). Each city is characterised by a metric A (for instance number of clients) and metric B (number of prospects). I have a number of N salesmen. I try to create N contiguous region/cluster such as each cluster/salesman has a number of client similar and the a number of prospect similar too. I might have others objectives in the future.

After some researches I found that the methods "Max-p-regions" or "Skater" could resolve my problem. I tried to understand the demonstration made with Spatial ‘K’luster Analysis by Tree Edge Removal: Clustering Airbnb Spots in Chicago in spopt documentation.

At the end of this demonstration, differents number of clusters are tested. For each tests, the number of Airbnb spots by created cluster is computed. However I thought the distribution of number of Airbnb spots intercluster (column "num_spots" would be balanced. Here is my misunderstanding.

image

I used Skater on my project and try to minimize these two metrics while having contiguous regions without any success:

'client_std' = gdf.groupby(region_id)['number_clients'].sum().std() 'prospect_std' = gdf.groupby(region_id)['number_prospects'].sum().std()

with gdf, my geodataframe.

May I have misunderstood the use of the model ? Maybe I have to find out the right combination of parameters (floor, trace, center,...) ?

Thank you very much for your help.