Closed micya closed 2 months ago
@micowan - thoughts on above?
You and I are thinking along the same lines. I actually started prototyping something similar to option 3; but ran out time to finish before the beginning of the Hackathon. I had built the metrics collection into the API (vs Azure Functions) as I did not know if Azure Functions would be confusing for the next person to come along and maintain the code. But overall, I like the 3 possible options.
You and I are thinking along the same lines. I actually started prototyping something similar to option 3; but ran out time to finish before the beginning of the Hackathon. I had built the metrics collection into the API (vs Azure Functions) as I did not know if Azure Functions would be confusing for the next person to come along and maintain the code. But overall, I like the 3 possible options.
That would work fine too. Will that also count new candidates (i.e., newly flagged but not yet reviewed)?
That was the one I could not crack with the API off the top of my head. Now since we would not have to include the metrics for Tags, Comments, etc. we might not need the aggregate records for unreviewed and could pull those with just a simple change to the index. But I am certainly no Cosmos DB expert, and am open to other solutions. Azure functions may be the only really good way to solve this one.
For a given time period, we calculate metrics, such as number reviewed, unreviewed, confirmed, false detection, and generate a list of comments & tags (see /api/metrics/system on swagger page). When timeframe is set to "all", all data is pulled from the system. This takes ~30 sec to load.
Several possible options outlined below.
Option 1: evaluate different data stores
Cosmos DB is not built for pulling cross-partition or for pulling huge amounts of data. Alternative services such as Azure SQL can be evaluated.
Option 2: tweaking with IQueryable
IQueryable utilizes lazy evaluation, which can include pushing execution to database engine. The full list of data is fetched at below point and processed in memory:
https://github.com/orcasound/aifororcas-livesystem/blob/e82a7ba5dcdfb59e9062897d6a00e23c93c4ae72/ModeratorFrontEnd/AIForOrcas/AIForOrcas.Server/Controllers/MetricsController.cs#L76-L77
Note the
ToList
call forces immediate evaluation. We may need some C# and Cosmos DB experts to evaluate whether forcing further lazy evaluation would improve performance. Be wary of potentially higher RU usage.Option 3: track running count
A new service can track running count for metrics as detection candidates are flagged or moderators act on candidates. The moderator portal would then only need to fetch the count instead of all data.
A new container should be added to Cosmos DB and will hold either a single document indicating running count or multiple documents partitioned on time range. A serverless solution (such as Azure Functions) can trigger on the original "metadata" collection, fetch prior count from document in new collection, update the running counts as appropriate, and persist by overwriting the original count document in the new collection.