Closed nathanielrindlaub closed 2 months ago
Good rules of thumb to follow when planning index strategies here: https://www.mongodb.com/docs/manual/applications/indexes/
After reading up on this and the looking at the indexes that MongoDB was recommending through it's "Performance Advisor" tab in its UI, I came to the conclusion that I would not have anticipated the indexes it recommended (not even close), so perhaps it's best to create indexes via the UI rather than in code and just follow what the Performance Advisor recommends.
For now let's go with that and see if we run into trouble.
Reopening this because we're still getting those query warnings even with new Project Label querying improvements, and the index size has ballooned. We need a real strategy for making sure queries are scalable.
While investigating indexes and possible optimizations a few potential bugs were found. Adding them here for visibility:
reviewed
and not reviewed
flags are unselected:"$match": { "$or": [ { "objects.locked": false }, { "objects": { "$size": 0 } }, { "objects": { "$not": { "$elemMatch": { "labels": { "$elemMatch": { "$or": [ { "validation": null }, { "validation.validated": true } ] } } } } } } ] } }, { "$match": { "objects.0": { "$exists": true }, "objects.locked": { "$ne": false }, "objects.labels": { "$elemMatch": { "$or": [ { "validation": null }, { "validation.validated": true } ] } } } }, { "$match": {} },
Not exactly a use case that should be supported, since no images should be returned. Should either be handled on frontend to prevent this option from being selected or input check should be done on the backend to prevent this potential timeout. Addressing this on the frontend as a part of this PR.
projectId, dateTimeOriginal, _id
, this potentially can be simplified to remove _id.{ "type": "command", "isFromUserConnection": true, "ns": "animl-prod.images", "collectionType": "normal", "command": { "aggregate": "images", "pipeline": [ { "$match": { "projectId": "island_spotted_skunks" } }, { "$match": {} }, { "$sort": { "dateTimeOriginal": -1, "_id": -1 } }, { "$limit": 51 } ],
Steps to reproduce: have a successful image count request, and then cause image count to timeout. ex: unchecking both reviewed
and not reviewed
@nathanielrindlaub
Did some cleanup of indexes based on common query requests as of 03/20/24 - 22:15:43.807UTC.
Will continue to monitor the performance advisor and for alerts. Opened #161 in order to reduce index size and to more easily query for reviewed/not-reviewed images. Will also update indexes to use this field once the back-end has been updated with this field. In the meantime, using objects.locked
in order to represent reviewed/not-reviewed images. Need better understanding of multi-key indexes: https://www.mongodb.com/docs/manual/core/indexes/index-types/index-multikey/. Using objects
and objects.locked
together should not be allowed, but is automatically done via mongodb Atlas.
@jue-henry, do we want to close this out in favor of some of the other related DB-optimization issues we have? Or is it worth keeping it open? Up to you.
@nathanielrindlaub I think we can close this for now as we have other issues open and indexes are not a huge problem from what I can tell
We're getting email alerts from the prod DB saying:
When I take a look at the MongoDB Atlas Query Profiler and db operation in question, it looks like it's the
getLabels
aggregation that occurs every time aprojects
query is requested:That approach to aggregating labels is pretty expensive to begin with (and might be worth re-thinking entirely), but when I look at the existing indexes in prod via MongoDB compass, it looks like we're not doing ourselves any favors by only indexing on the
_id
field. We should definitely be indexing byprojectId
as well - which we are in the dev DB and is defined in the MongooseImage
schema. What's odd is that it looks like I added those indexes to the schema over a year ago, so I'm not sure why they didn't get re-indexed on prod?Additionally, mongo-cursor-pagination recommends that we index every field that might be passed to its
paginatedField
prop. ForImage.queryByFilter()
, it looks like the paginated field options forImage
querying are basically any column in the image table in the front end that can be sorted, so currentlydateTimeOriginal
,dateAdded
, andcameraId
. We've also been using mongo-cursor-pagination elsewhere (for querying Batches and BatchErrors, for example), so it would be worth auditing those and seeing what fields if any should get indexed on their respective collections.