Speed up `export_esdl` POST endpoint by collecting and caching gquery requests

thomas-qah commented 1 year ago

Why?

The export_esdl endpoint of the ETM ESDL API is currently so slow that requests towards it result in a timeout. It is so slow because internally it performs about 25 or more requests towards etengine, taking about 5 seconds each, making the endpoint take several minutes to respond. The proxy that sits between the ESDL API and the end-user cuts off requests that take longer than 30 seconds, resulting in said timeout. Due to this the endpoint is unusable at this point.

What?

The changes in this PR cut down the number of requests that the ESDL API performs towards etengine to about 1 to 3, possibly making it respond in below 15 seconds and likely much faster. Local testing returned requests in about 3-4 seconds.

How?

The ESDL API now first collects the elements (gqueries) for which results should be obtained from etengine and then bundles these into one request (etengine luckily supports processing multiple gqueries at once). The collection of these requests is done through the GqueryCache Singleton class which is introduced in this PR. Next to collecting the gqueries, it also caches the etengine response (result) for each gquery so that the codebase can get these results from the GqueryCache at any given time. GqueryCache will determine whether a request towards etengine is necessary or not by checking for missing results.

Usage

Obtaining results (values returned by etengine) for gqueries can be done as follows:

# First set the proper scenario-id and gqueries we want to get results for
GqueryCache().for_scenario_id(10)
GqueryCache().for_gqueries(['gquery_a', 'gquery_b'])

# Then perform the request and store the results
GqueryCache().perform_request()

# Lastly, obtain the results for the gqueries we requested.
# This can be done anywhere in the codebase at this point.
gquery_results = GqueryCache().get(['gquery_a', 'gquery_b'])
# returns: {'gquery_a': 'result a', 'gquery_b': 'result b'}

GqueryCache supports method chaining. The above can be written as a one-liner like this as well:

# Set scenario-id, request results for given gqueries and obtain them, all in one go.
# GqueryCache will automatically perform a request for missing gquery results when calling get().
gquery_results = GqueryCache().for_scenario_id(10).get(['gquery_a', 'gquery_b'])

Closes #110

Charlottevm commented 1 year ago

Content of the ESDL-files also looks good to me !

thomas-qah commented 1 year ago

My only concern is that because it's a Singleton, the old data never gets removed.

@noracato very good point! I wasn't sure if the Singleton would 'survive' in between separate requests, as this is not the case in most web frameworks. Here it seems to be the case however, so I've introduced a cache invalidation system in commit 2adb3e5.

Please let me know what you think! :)

quintel / etm-esdl