Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks
After changes in #436 the H3Neighbourhood became undeterministic. The underlying library (h3py) returns from 4.0.0.b3 - https://github.com/uber/h3-py/pull/339 neighbours in random order. That means that downstream models cannot be forced to return the same results between different sessions.
Potential solutions:
sort values from neighbourhoods inside the models - easiest for now, but that needs to be remembered across the models
change interface and logic of neighbourhoods to return sorted results (probably list instead of set) - preferred, one fix and done
e.g. a solution for 1. for Hex2VecEmbedder can look like that
def _build_lookup_tables(self, data: pd.DataFrame, neighbourhood: Neighbourhood[T]) -> None:
anchor_df_locs_lookup: list[int] = []
positive_df_locs_lookup: list[int] = []
for region_df_loc, region_index in tqdm(enumerate(data.index), total=len(data)):
region_direct_neighbours = sorted(neighbourhood.get_neighbours(region_index))
neighbours_df_locs = {
self._region_index_to_df_loc[neighbour_index]
for neighbour_index in region_direct_neighbours
}
anchor_df_locs_lookup.extend([region_df_loc] * len(neighbours_df_locs))
positive_df_locs_lookup.extend(neighbours_df_locs)
indices_excluded_from_negatives = sorted(neighbourhood.get_neighbours_up_to_distance(
region_index, self._negative_sample_k_distance
))
self._excluded_from_negatives[region_df_loc] = {
self._region_index_to_df_loc[excluded_index]
for excluded_index in indices_excluded_from_negatives
}
self._anchor_df_locs_lookup = np.array(anchor_df_locs_lookup)
self._positive_df_locs_lookup = np.array(positive_df_locs_lookup)
After changes in #436 the H3Neighbourhood became undeterministic. The underlying library (h3py) returns from 4.0.0.b3 - https://github.com/uber/h3-py/pull/339 neighbours in random order. That means that downstream models cannot be forced to return the same results between different sessions.
Potential solutions:
e.g. a solution for 1. for Hex2VecEmbedder can look like that