Closed knaaptime closed 4 years ago
A dict as an attribute of the Community
class is a great strategy to store the metadata.
what should we call this attribute? and what do we want to store?
what about something like models
which is a dict that stores a model instance keyed on the model name. models is maybe too generic if we're only storing clusters. We could use clusters
(though we already have cluster
and cluster_spatial
methods)
so like,
columbus = columbus.cluster(columns=['median_household_income', 'p_poverty_rate', 'p_edu_college_greater', 'p_unemployment_rate'], method='ward')
would store a new entry in clusters, so if you did
columbus.clusters['ward']
you'd get back the fitted sklearn.cluster.Ward
instance. The column name always matches the key in Community.clusters
took a shot at this in #158, which adds a models
attribute to the Community. Right now it only stores clusters, but it would make sense to do the same thing with sequence and transition models i think? The sequence models in particular could probably adopt the same convention
currently, this is storing a namedtuple with X, labels, column names, and W (if there is one). In short, everything you need for a silhouette or geosilhouette score, and the colnames so you can keep track of which model is which
oh and the model instance itself
resolved by #158
currently when a user calls
Community.cluster
the method returns a new community with cluster labels appended as a new attribute on the gdf. But it's common for users to try out several clustering methods, algorithms, k parameters, etc and if each of those explorations is sourced from the sameCommunity
, it might be nice to attach the metadata along with each new cluster instance.Currently, this is handled by allowing the user to return both the underlying cluster instance and the new community by passing
return_cluster=True
. That's a flexible way for the user to keep both the community and the metadata behind the cluster labels, but said user needs to handle that data management herself. It might be nice to include a dict or something as an attribute of the community that handles some of this