Improve metrics loading time

micya commented 2 years ago

For a given time period, we calculate metrics, such as number reviewed, unreviewed, confirmed, false detection, and generate a list of comments & tags (see /api/metrics/system on swagger page). When timeframe is set to "all", all data is pulled from the system. This takes ~30 sec to load.

Several possible options outlined below.

Option 1: evaluate different data stores

Cosmos DB is not built for pulling cross-partition or for pulling huge amounts of data. Alternative services such as Azure SQL can be evaluated.

Option 2: tweaking with IQueryable

IQueryable utilizes lazy evaluation, which can include pushing execution to database engine. The full list of data is fetched at below point and processed in memory:

https://github.com/orcasound/aifororcas-livesystem/blob/e82a7ba5dcdfb59e9062897d6a00e23c93c4ae72/ModeratorFrontEnd/AIForOrcas/AIForOrcas.Server/Controllers/MetricsController.cs#L76-L77

Note the ToList call forces immediate evaluation. We may need some C# and Cosmos DB experts to evaluate whether forcing further lazy evaluation would improve performance. Be wary of potentially higher RU usage.

Option 3: track running count

A new service can track running count for metrics as detection candidates are flagged or moderators act on candidates. The moderator portal would then only need to fetch the count instead of all data.

A new container should be added to Cosmos DB and will hold either a single document indicating running count or multiple documents partitioned on time range. A serverless solution (such as Azure Functions) can trigger on the original "metadata" collection, fetch prior count from document in new collection, update the running counts as appropriate, and persist by overwriting the original count document in the new collection.

micya commented 2 years ago

@micowan - thoughts on above?

micowan commented 2 years ago

You and I are thinking along the same lines. I actually started prototyping something similar to option 3; but ran out time to finish before the beginning of the Hackathon. I had built the metrics collection into the API (vs Azure Functions) as I did not know if Azure Functions would be confusing for the next person to come along and maintain the code. But overall, I like the 3 possible options.

micya commented 2 years ago

You and I are thinking along the same lines. I actually started prototyping something similar to option 3; but ran out time to finish before the beginning of the Hackathon. I had built the metrics collection into the API (vs Azure Functions) as I did not know if Azure Functions would be confusing for the next person to come along and maintain the code. But overall, I like the 3 possible options.

That would work fine too. Will that also count new candidates (i.e., newly flagged but not yet reviewed)?

micowan commented 2 years ago

That was the one I could not crack with the API off the top of my head. Now since we would not have to include the metrics for Tags, Comments, etc. we might not need the aggregate records for unreviewed and could pull those with just a simple change to the index. But I am certainly no Cosmos DB expert, and am open to other solutions. Azure functions may be the only really good way to solve this one.

orcasound / aifororcas-livesystem