worldbank / DECAT_Space2Stats

https://worldbank.github.io/DECAT_Space2Stats/
Other
1 stars 2 forks source link

Raise exception when no data is returned for small features #58

Closed andresfchamorro closed 1 month ago

andresfchamorro commented 2 months ago

As a workaround to https://github.com/worldbank/DECAT_Space2Stats/issues/56, I tried iterating over every polygon component. The API fails either because there is no source data or the intersect does not provide any matches (I tried both centroid and touches). Can we investigate which is the case? Either way, it should raise an exception explaining the issue. Right now it still returns an empty array.

To Reproduce

from typing import Dict
import requests
import geopandas as gpd
import pandas as pd
from geojson_pydantic import Feature, Polygon
import json
from shapely.geometry import shape

BASE_URL = "https://space2stats.ds.io"
FIELDS_ENDPOINT = f"{BASE_URL}/fields"
SUMMARY_ENDPOINT = f"{BASE_URL}/summary"

iso3 = 'KEN'
adm = 'ADM1'
url = f'https://www.geoboundaries.org/api/current/gbOpen/{iso3}/{adm}/'
res = requests.get(url).json()
adm = gpd.read_file(res['gjDownloadURL'])
adm = adm.loc[[42]].copy()
adm = adm.explode(index_parts=False)
adm.reset_index(inplace=True, drop=True)

AOIModel = Feature[Polygon, Dict]

geojson_str = adm.to_json()
adm_geojson = json.loads(geojson_str)
adm_features = adm_geojson['features']

gdfs = []
for feature in adm_features:

    # I think this validates the geojson
    feat = AOIModel(**feature)

    # Define the Request Payload
    request_payload = {
        "aoi": feature,
        "spatial_join_method": "touches",
        "fields": ["sum_pop_2020"], 
        "geometry": "polygon"
    }

    # Get Summary Data
    response = requests.post(SUMMARY_ENDPOINT, json=request_payload)
    if response.status_code != 200:
        raise Exception(f"Failed to get summary: {response.text}")

    summary_data = response.json()
    if summary_data == []: # Error needs to be added to API
        print(f"Failed to get summary for {feature['id']}")
        continue

    df = pd.DataFrame(summary_data)
    df['adm_id'] = int(feature['id'])
    df['adm_name'] = feature['properties']['shapeName']
    df['geometry'] = df['geometry'].apply(lambda geom: shape(geom))
    gdf = gpd.GeoDataFrame(df, geometry='geometry', crs='EPSG:4326')
    gdfs.append(gdf)
zacharyDez commented 1 month ago

@alukach and I dove more into this.

Overview

As you described, there's an issue with how we are using the H3 polyfill method, which determines the containment of hexagons within a polygon. The problem arises because this method verifies containment by checking the centroid of the H3 hexagons, not the polygon’s geometry itself. This is problematic as we assumed the API used a “touches” (intersects) logic to determine which hexagons intersect with the input polygon.

For more information on the polyfill method, please look at the H3 polyfill documentation.

We could store the geometries as a lookup table and rely on spatial indexing. Given we are working at a single h3 level 6, this could be considered as an approach. However, we would significantly inflate the size and reliance on the Postgis database.

Proposed Solution

To address this issue, I plan to implement a new approach that maintains efficiency while performing spatial operations, even though it requires processing a limited number of features.

  1. Identify Parent Cells: We will find the H3 level 5 cells associated with the input polygon’s centroid and vertices. This will serve as our base for determining which hexagons to evaluate further.
  2. Generate Child Hexagons: From these level 5 hexagons, we will generate the corresponding H3 level 6 cells (children hexagons) that fall underneath them.
  3. Union with Polyfill: We will also apply the polyfill method to obtain the level 6 hexagons covering the polygon area which would be excluded using only the hexagons from prior steps for larger polygons would be excluded for large. Finally, we will compute the union of these hexagons with the ones derived from the parent cells.
  4. Filter Based on Spatial Aggregation Methods: This combined set of hexagons will then be filtered based on our desired spatial aggregation methods: “touches,” “within,” and “centroid.”
zacharyDez commented 1 month ago

I worked on a more straightforward approach that leverages the h3 functionality as much as possible instead of relying on spatial functions.

Current Approach in #69

  1. Recursive Polyfill: We first recursively use the polyfill method at higher resolutions until we obtain valid results. This establishes a base set of hexagon IDs corresponding to the centroid approach.
  2. Generate Parent Hexagons: We generate parent hexagon cells at the desired resolution from the valid hexagon IDs retrieved. This corresponds with the centroid filtering approach, ensuring we only consider relevant hexagons.
zacharyDez commented 1 month ago

@andresfchamorro, confirming that with the new changes in #69. I added your initial work as another example.