worldbank / DECAT_Space2Stats

https://worldbank.github.io/DECAT_Space2Stats/
Other
1 stars 1 forks source link

Summary endpoint issue with more than 5 fields #34

Closed zacharyDez closed 3 weeks ago

zacharyDez commented 3 weeks ago

Describe the bug

@andresfchamorro raised a performance issue during our last call where the summary endpoint has performance issues with more than 5 fields.

To Reproduce

Expected behavior

Performance is linear

zacharyDez commented 3 weeks ago

So, I could not reproduce the issue with 5-7 fields @andresfchamorro. Here's my quick and dirty benchmarking code:

def fetch_summary(fields, warm_up=False):
    if warm_up:
        # Perform a warm-up request with minimal payload
        warm_up_payload = {
            "aoi": aoi,
            "spatial_join_method": "centroid",
            "fields": ["sum_pop_2020"], 
            "geometry": "point"
        }
        requests.post(SUMMARY_ENDPOINT, json=warm_up_payload)

    # Request payload with the specified fields
    request_payload = {
        "aoi": aoi,
        "spatial_join_method": "centroid",
        "fields": fields, 
        "geometry": "point"
    }

    response = requests.post(SUMMARY_ENDPOINT, json=request_payload)
    if response.status_code != 200:
        raise Exception(f"Failed to get summary: {response.text}")

    return response.json()

# Benchmark function with timing and optional cold start delay
def benchmark(fields, delay_before_request=0):
    if delay_before_request > 0:
        time.sleep(delay_before_request)  # Simulate a cold start by waiting

    execution_time = timeit.timeit(lambda: fetch_summary(fields), number=1)
    print(f"Time for {len(fields)} fields: {execution_time:.4f} seconds")

# Perform benchmarks with warm-ups and different delays for cold starts
for i in range(8):
    try:
        benchmark(available_fields[:i], delay_before_request=0)  # Warm start
    except Exception as e:
        print(f"Error: {str(e)}")
        print(f"Missed on: {i} with fields {available_fields[:i]}")
        break

And the results:

Time for 0 fields: 4.2473 seconds
Time for 1 fields: 4.3757 seconds
Time for 2 fields: 4.4787 seconds
Time for 3 fields: 4.4958 seconds
Time for 4 fields: 4.6102 seconds
Time for 5 fields: 4.7435 seconds
Time for 6 fields: 4.7888 seconds
Time for 7 fields: 5.1760 seconds

Every run has variations, but the response is relatively constant at ~5 seconds per request, independently of field size.

It's possible that you requested data for a larger area (or something similar), which caused you to hit the size lambda size limits described in #35.

@andresfchamorro could you share the exact steps you used to reproduce the issue? I want to confirm whether this is a duplicate of #35 or its own separate bug.

andresfchamorro commented 3 weeks ago

@zacharyDez Interesting, I was still working out of the Kenya AOI. In the context of population, to me it makes sense that someone would want the full list of demographic variables.

As long as we document clearly what is the upper limit (< 10?), I don't see this as an issue. We can always point to ways of looping requests right?

zacharyDez commented 3 weeks ago

@andresfchamorro; ok great. I'll close this issue as we have #35 and #37.