ngageoint / scale

Processing framework for containerized algorithms
http://ngageoint.github.io/scale/
Apache License 2.0
105 stars 45 forks source link

/ingests/status Slow With a Large Database #1876

Closed emimaesmith closed 4 years ago

emimaesmith commented 4 years ago

Description The REST API for /ingests/status takes too long to return when the database is very large.

Reproduction Steps Steps to reproduce the problem:

  1. Have a large database
  2. Navigate to /v6/ingets/status
  3. Watch it return a 504

Expected behavior The /ingest/status should return results - not a 504.

Related to #1831

emimaesmith commented 4 years ago

Adding timings to the different parts of the get_status method reveals the majority of the time is spent grouping the results by time.

mheppner commented 4 years ago

A few optimizations can be made on the db level for ingests.

For /ingests, some massive improvements:

For /ingests/status:

/ingests/status isn't too much better but there isn't much else to improve. There is a bit more computation going on at the Python level than with other requests, but you can see the overhead time (difference in db queries vs full HTTP request) isn't that much more. The single query that is holding most of this up:

SELECT ••• FROM "ingest" LEFT OUTER JOIN "strike" ON ("ingest"."strike_id" = "strike"."id") WHERE "ingest"."status" = 'INGESTED' ORDER BY "ingest"."data_started" ASC

I'm guessing it's just due to the size of the database and that inner join.