hackoregon / civic-devops

Master collection point for issues, procedures, and code to manage the HackOregon Civic platform
MIT License
11 stars 4 forks source link

Please speed up Team disaster AEBM REST API call #279

Closed karenng-civicsoftware closed 4 years ago

karenng-civicsoftware commented 4 years ago

TEAM NAME: Team disaster **PRIORITY (1-5): 3

Our current REST API call

http://service.civicpdx.org/disaster-resilience/api/AebmResults/

that retrieves the data is taking 40+ seconds. (see screenshot below for evidence) Is it possible for someone to help profile the part that is taking so long?

Screen Shot 2019-09-08 at 4 39 37 PM

I used wget to download a JSON file containing the same dataset using my slow home wifi at 400 kB/s, it took 10 seconds only.

$ time wget https://gist.github.com/karenng-civicsoftware/e4a93a8d13c74f2ef4e033909170c652/raw/032492d02df23d4262f35d9cd5c4e1cdd98abae3/aebm_commercial_building_loss_ratio.json
--2019-09-22 17:55:27--  https://gist.github.com/karenng-civicsoftware/e4a93a8d13c74f2ef4e033909170c652/raw/032492d02df23d4262f35d9cd5c4e1cdd98abae3/aebm_commercial_building_loss_ratio.json
Resolving gist.github.com (gist.github.com)... 192.30.255.118
Connecting to gist.github.com (gist.github.com)|192.30.255.118|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://gist.githubusercontent.com/karenng-civicsoftware/e4a93a8d13c74f2ef4e033909170c652/raw/032492d02df23d4262f35d9cd5c4e1cdd98abae3/aebm_commercial_building_loss_ratio.json [following]
--2019-09-22 17:55:27--  https://gist.githubusercontent.com/karenng-civicsoftware/e4a93a8d13c74f2ef4e033909170c652/raw/032492d02df23d4262f35d9cd5c4e1cdd98abae3/aebm_commercial_building_loss_ratio.json
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3985109 (3.8M) [text/plain]
Saving to: ‘aebm_commercial_building_loss_ratio.json’

aebm_commercial_building_loss_ratio.json     100%[============================================================================================>]   3.80M   484KB/s    in 9.2s

2019-09-22 17:55:37 (422 KB/s) - ‘aebm_commercial_building_loss_ratio.json’ saved [3985109/3985109]

real    0m10.157s
user    0m0.061s
sys 0m0.132s

Test file for checking download speed can be found at https://gist.github.com/karenng-civicsoftware/e4a93a8d13c74f2ef4e033909170c652

acceptance criteria

We will accept solutions that resemble the following, and they are listed in the order of our preference

  1. speed up API call by finding the bottleneck (from db / browser / card layer) and removing it if possible - we would also like to know how many data points would be optimally supported for our REST api calls for acceptable API performance

  2. work with front-end to see if there are cache-ing solution to first load part of the data while the rest of the dataset waits to be loaded and plotted

  3. summarize the datasets and make them smaller on the side of the backend before providing them via REST

Please let us know what the backend (& front end) team recommends

We hope to make the two cards linked

load quickly (~10 seconds for all 27k data points) after this issue is resolved.

karenng-civicsoftware commented 4 years ago

https://scaleyourcode.com/blog/article/27 <- resource from @znmeb for profiling what is taking so long from the API

znmeb commented 4 years ago

Question for the front-end - will the request always want the whole table? If it does, would a precomputed GeoJSON dataset work?

BrianHGrant commented 4 years ago

Attached profile document output created using this package:

prof.txt

Many of the top time events are related to the geoserializer/serialization of the data set:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
410567/355825    7.894    0.000   11.254    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/geos/prototypes/threadsafe.py:42(__call__)
  1669887    5.923    0.000   10.116    0.000 {built-in method builtins.isinstance}
   246339    3.094    0.000    8.227    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/fields.py:54(is_simple_callable)
   985358    3.085    0.000    3.085    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/ptr.py:14(ptr)
   246339    3.057    0.000   18.276    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/fields.py:88(get_attribute)
410567/355825    2.778    0.000   13.654    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/geos/libgeos.py:154(__call__)
   246339    2.529    0.000    4.194    0.000 /usr/local/lib/python3.6/abc.py:180(__instancecheck__)
    27371    2.433    0.000   58.939    0.002 /usr/local/lib/python3.6/site-packages/rest_framework_gis/serializers.py:91(to_representation)
    27371    2.125    0.000    2.514    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/gdal/srs.py:275(import_epsg)
   273710    1.764    0.000    2.832    0.000 /usr/local/lib/python3.6/_collections_abc.py:760(__iter__)
   492722    1.663    0.000    1.663    0.000 /usr/local/lib/python3.6/_weakrefset.py:70(__contains__)
    27371    1.549    0.000    7.311    0.000 /usr/local/lib/python3.6/site-packages/django/db/models/base.py:395(__init__)
    27371    1.518    0.000   17.260    0.001 /usr/local/lib/python3.6/site-packages/rest_framework_gis/serializers.py:136(get_properties)
   246339    1.517    0.000   19.793    0.000
...

So yeah, it does not appear to be pulling the data itself, but the formation of the serialized json response, basically the nested object

I am going to attempt to serve static file via api

BrianHGrant commented 4 years ago

OK, doing some local testing I was able to serve static file at a greatly increased speed:

         9954 function calls (9900 primitive calls) in 0.333 seconds

   Ordered by: internal time
   List reduced from 359 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       14    0.167    0.012    0.253    0.018 /usr/local/lib/python3.6/site-packages/django/core/files/base.py:88(__iter__)
      435    0.044    0.000    0.044    0.000 {method 'read' of '_io.BufferedReader' objects}
      427    0.021    0.000    0.021    0.000 {method 'splitlines' of 'bytes' objects}
        8    0.021    0.003    0.021    0.003 {built-in method io.open}
     1379    0.006    0.000    0.006    0.000 {built-in method builtins.isinstance}
      434    0.005    0.000    0.048    0.000 /usr/local/lib/python3.6/site-packages/django/core/files/base.py:57(chunks)
      427    0.005    0.000    0.009    0.000 /usr/local/lib/python3.6/site-packages/django/core/files/base.py:162(endswith_lf)
      420    0.005    0.000    0.008    0.000 /usr/local/lib/python3.6/site-packages/django/core/files/base.py:157(endswith_cr)
      847    0.004    0.000    0.004    0.000 {method 'endswith' of 'bytes' objects}
        8    0.002    0.000    0.002    0.000 {method 'close' of '_io.BufferedReader' objects}
      242    0.002    0.000    0.003    0.000 /usr/local/lib/python3.6/enum.py:265(__call__)
      435    0.002    0.000    0.002    0.000 /usr/local/lib/python3.6/site-packages/django/core/files/utils.py:16(<lambda>)
      111    0.002    0.000    0.005    0.000 /usr/local/lib/python3.6/enum.py:801(__and__)
      9/4    0.002    0.000    0.006    0.002 /usr/local/lib/python3.6/sre_parse.py:470(_parse)
     17/4    0.002    0.000    0.008    0.002 /usr/local/lib/python3.6/sre_compile.py:64(_compile)
      444    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
       42    0.001    0.000    0.001    0.000 {method 'pop' of 'list' objects}
      242    0.001    0.000    0.001    0.000 /usr/local/lib/python3.6/enum.py:515(__new__)
       42    0.001    0.000    0.004    0.000 /usr/local/lib/python3.6/site-packages/django/http/multipartparser.py:647(parse_header)
       42    0.001    0.000    0.002    0.000 /usr/local/lib/python3.6/site-packages/django/http/response.py:106(_convert_to_charset)
       17    0.001    0.000    0.001    0.000 /usr/local/lib/python3.6/posixpath.py:331(normpath)
  214/206    0.001    0.000    0.001    0.000 {built-in method builtins.len}
       42    0.001    0.000    0.001    0.000 /usr/local/lib/python3.6/site-packages/django/http/multipartparser.py:679(_parse_header_params)
      159    0.001    0.000    0.001    0.000 {built-in method builtins.getattr}
       94    0.001    0.000    0.001    0.000 /usr/local/lib/python3.6/sre_parse.py:253(get)
       10    0.001    0.000    0.001    0.000 {built-in method posix.stat}
       42    0.001    0.000    0.005    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/utils/mediatypes.py:51(__init__)
      140    0.001    0.000    0.001    0.000 {built-in method builtins.hasattr}
       77    0.001    0.000    0.001    0.000 /usr/local/lib/python3.6/sre_parse.py:163(__getitem__)
       49    0.001    0.000    0.001    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/request.py:404(__getattr__)
        7    0.001    0.000    0.333    0.048 /usr/local/lib/python3.6/site-packages/rest_framework/views.py:470(dispatch)
      110    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/sre_parse.py:232(__next)
    30/17    0.000    0.000    0.001    0.000 /usr/local/lib/python3.6/sre_parse.py:173(getwidth)
       91    0.000    0.000    0.000    0.000 {method 'encode' of 'str' objects}
        7    0.000    0.000    0.007    0.001 /usr/local/lib/python3.6/site-packages/rest_framework/negotiation.py:37(select_renderer)
       67    0.000    0.000    0.001    0.000 /usr/local/lib/python3.6/sre_parse.py:171(append)
       15    0.000    0.000    0.001    0.000 /usr/local/lib/python3.6/posixpath.py:75(join)
        7    0.000    0.000    0.001    0.000 /usr/local/lib/python3.6/site-packages/django/views/generic/base.py:106(<listcomp>)
        7    0.000    0.000    0.312    0.045 /code/api/views.py:21(list)
        2    0.000    0.000    0.000    0.000 {method 'read' of '_io.FileIO' objects}
       81    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        8    0.000    0.000    0.004    0.000 /usr/local/lib/python3.6/site-packages/django/utils/_os.py:25(safe_join)
       21    0.000    0.000    0.002    0.000 /usr/local/lib/python3.6/site-packages/django/http/response.py:133(__setitem__)
        6    0.000    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:1233(find_spec)
        7    0.000    0.000    0.015    0.002 /usr/local/lib/python3.6/site-packages/rest_framework/views.py:384(initial)
       32    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:57(_path_join)
        7    0.000    0.000    0.333    0.048 /usr/local/lib/python3.6/site-packages/rest_framework/viewsets.py:74(view)
       32    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
        7    0.000    0.000    0.000    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/views.py:285(<listcomp>)
       17    0.000    0.000    0.002    0.000 /usr/local/lib/python3.6/posixpath.py:369(abspath)

So, we can tell that this does "work", though trade-offs:

1) limit/offset and detail (ie single id searches) will not work OOTB, an option to solve maybe to provide 2 separate endpoints? one pulling the static data, one pulling via db queries for filtering purposes?

2) if this dataset changes/updates we would need a mechanism to update the static json file. perhaps this could be done via a background worker queue (ie celery). If we don't expect updates/changes then this would not matter as much

Alternative approach maybe looking at setting up a backend cache (perhaps in-memory, if not too intense) The first request may take some time, but should significantly speed up responses. Returning to the background queue, we could potentially schedule a job to hit this endpoint/refresh cache, so would be pre-cached for when users do attempt to access

This should solve the question of changes/updates to database, also allow using the existing list + get endpoints.

Downside would be time/resources dedicated to infrastructure updates, implementation of caching system

znmeb commented 4 years ago

This particular table isn't all that big - I think it was 3 megabytes in the database. But I didn't measure the size of the GeoJSON serialized version.

We might be able to store the table in the database as JSONB and use a Django JSONField to retrieve it. That would at least pre-compute all the GIS stuff that is going on.

https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#jsonfield

znmeb commented 4 years ago

It turns out I have the GeoJSON for that dataset on my hard drive - I was the one that loaded it into their database. It's only 17 megabytes. We could store it in the container and just push it out via the API. ;-)

jaronheard commented 4 years ago

Question for the front-end - will the request always want the whole table? If it does, would a precomputed GeoJSON dataset work?

Yep, for our current purposes, we always want the whole thing, at least spatially. If we want to perfectly optimize, then we'd only be getting the fields that we need for each visualization.

znmeb commented 4 years ago

@jaronheard @BrianHGrant @karenng-civicsoftware In this particular case the whole object is smallish - about 17 megabytes. If we just precomputed it and sent it down the pipe (gzipped) you'd have about a 3 megabyte file transfer hit.

I'm hoping there's a more generalizable solution if the issue is indeed inefficient conversion of a spatial data frame with a geometry column (which PostGIS stores as "Well-known binary") to GeoJSON. For openers, on the transportation spatial data frames we added longitude and latitude columns so the API could do bounding box calculations with two simple compares.

znmeb commented 4 years ago

Attached profile document output created using this package:

prof.txt

Many of the top time events are related to the geoserializer/serialization of the data set:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
410567/355825    7.894    0.000   11.254    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/geos/prototypes/threadsafe.py:42(__call__)
  1669887    5.923    0.000   10.116    0.000 {built-in method builtins.isinstance}
   246339    3.094    0.000    8.227    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/fields.py:54(is_simple_callable)
   985358    3.085    0.000    3.085    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/ptr.py:14(ptr)
   246339    3.057    0.000   18.276    0.000 /usr/local/lib/python3.6/site-packages/rest_framework/fields.py:88(get_attribute)
410567/355825    2.778    0.000   13.654    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/geos/libgeos.py:154(__call__)
   246339    2.529    0.000    4.194    0.000 /usr/local/lib/python3.6/abc.py:180(__instancecheck__)
    27371    2.433    0.000   58.939    0.002 /usr/local/lib/python3.6/site-packages/rest_framework_gis/serializers.py:91(to_representation)
    27371    2.125    0.000    2.514    0.000 /usr/local/lib/python3.6/site-packages/django/contrib/gis/gdal/srs.py:275(import_epsg)
   273710    1.764    0.000    2.832    0.000 /usr/local/lib/python3.6/_collections_abc.py:760(__iter__)
   492722    1.663    0.000    1.663    0.000 /usr/local/lib/python3.6/_weakrefset.py:70(__contains__)
    27371    1.549    0.000    7.311    0.000 /usr/local/lib/python3.6/site-packages/django/db/models/base.py:395(__init__)
    27371    1.518    0.000   17.260    0.001 /usr/local/lib/python3.6/site-packages/rest_framework_gis/serializers.py:136(get_properties)
   246339    1.517    0.000   19.793    0.000
...

So yeah, it does not appear to be pulling the data itself, but the formation of the serialized json response, basically the nested object

I am going to attempt to serve static file via api

Side not - everything in this profile with 27371 calls was most likely called exactly once for every row in the table.

bhgrant8 commented 4 years ago

@znmeb ,Agree the single call per point. We'll have limited control over this, aside from looking at the open source package directly, or building a custom serializer.

As it appears that the following conditions are true, seems to make sense to pre-load as static json file, and serve through API:

As an mvp/first iteration expect following work:

  1. Store current full return as a formatted geojson object matching API spec (to keep API contract with end users)
  2. Add to API container image (will need to be within staticfiles, so either need to gitkeep this specific file, or provide mechanism to copy into directory at runtime)
  3. Allow endpoint to read the json and serve contents directly via HttpResponse (example: https://stackoverflow.com/questions/39001230/how-to-put-a-json-files-content-in-a-response)
  4. Perform/build adequate acceptance + integration testing locally, prior to deploying
  5. Deploy and profit

Acceptance:

A possible 2nd iteration:

Background process calls non-public endpoint to dynamically generate json, store to OS or possibly s3 at a certain interval. Idea would be to allow for updates within a certain timeframe (ie 24hrs). Whether useful for this particular project, this would be a more sustainable/reusable component.

@znmeb , @karenng-civicsoftware , @jaronheard , @nam20485

bhgrant8 commented 4 years ago

ok pr submitted: https://github.com/hackoregon/disaster-resilience-backend/pull/53

nam20485 commented 4 years ago

The pre-loaded static json call has been deployed to production.

@karenng-civicsoftware Have you had a chance to test it to see if it meets the acceptance criteria (specifically the load/response time)?

  • Does not break current frontend/end user behavior
  • Reduces load time for this endpoint to an acceptable rate (~10 seconds for all 27k data points), but I believe we maybe more in the <5 second realm
jaronheard commented 4 years ago

I've reviewed this for the acceptance criteria and am satisfied!

Jaron Heard Creative Director Civic Software Foundation

+1-971-998-7180

On Sun, Sep 29, 2019 at 3:58 PM, Nathan Miller < notifications@github.com > wrote:

The pre-loaded stati json call has been deployed to production.

@ karenng-civicsoftware ( https://github.com/karenng-civicsoftware ) Have you had a chance to test it to see if it meets the acceptance criteria (specifically the load/response time)?

  • Does not break current frontend/end user behavior
  • Reduces load time for this endpoint to an acceptable rate (~10 seconds for all 27k data points), but I believe we maybe more in the <5 second realm

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/hackoregon/civic-devops/issues/279?email_source=notifications&email_token=ABV5AX6C62BPQHIQOGQXWLLQMEXKDA5CNFSM4IZFZ3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD74A2HI#issuecomment-536349981 ) , or mute the thread ( https://github.com/notifications/unsubscribe-auth/ABV5AX2RVFH53IDDAFQT2MDQMEXKDANCNFSM4IZFZ3EQ ).

nam20485 commented 4 years ago

@jaronheard How long did it take for the full response to come back?

jaronheard commented 4 years ago

2700 -3200 ms in my testing, which is great!

You can check it out at: http://civicplatform.org/cards/tillamook-county-earthquake-casualty-estimates

http://civicplatform.org/cards/earthquake-damage-estimates-for-buildings-in-tillamook-county

Jaron Heard Creative Director Civic Software Foundation

+1-971-998-7180

On Mon, Sep 30, 2019 at 11:24 AM, Nathan Miller < notifications@github.com > wrote:

@ jaronheard ( https://github.com/jaronheard ) How long did it take for the full response to come back?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/hackoregon/civic-devops/issues/279?email_source=notifications&email_token=ABV5AX3E7JYJS2IBJLS23CTQMI76NA5CNFSM4IZFZ3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76TQ7A#issuecomment-536688764 ) , or mute the thread ( https://github.com/notifications/unsubscribe-auth/ABV5AX2ZBZL4MYATIVSJXS3QMI76NANCNFSM4IZFZ3EQ ).

znmeb commented 4 years ago

Interesting - looks like the points come back instantly and the background tiles show up noticeably later.

nam20485 commented 4 years ago

@Jaron can measure the response time one more time for me please? (I had the wrong endpoint answering due to an incorrect merge)

jaronheard commented 4 years ago

@nam20485 I measured in the 3000-3500ms range

bhgrant8 commented 4 years ago

Closing issue at this time.

Current resolution is a "just enough" usage of static json response served via api. if data becomes dynamic/needs further updates we can return.

possible use case for cached data