Open eecavanna opened 2 days ago
It was me. I wanted to create a unique=True
compound index to check if that compound key could serve as a surrogate-yet-still-semantic primary key in lieu of id
, which that collection didn't have, as part of my exploration to address https://github.com/microbiomedata/nmdc-runtime/issues/414. I ultimately settled on using mongodb's native (sortable, unique) _id
field in order to implement pagination for that collection.
Indeed, the compound index is not used in any production capacity.
Thanks, @dwinston. In terms of how it was created, was it created manually via mongosh
(or equivalent) or was it created by some code that exists in the repo?
I should add that, due to the large size of the functional_annotation_agg
collection at the time of that work on #414, that collection was not included in my workflow for ensuring a local cache of production schema collections for sandboxed local db experimentation during development. The temptation to "experiment" on (even "non-destructively", as is the case with index creation) the production database has since been remedied. :)
@eecavanna it was created manually via direct (py)mongo command using privileged credentials (perhaps in a jupyter notebook where I was prototyping, perhaps via Studio3T GUI -- I forget).
OK, I understand now—thanks!
While it's top-of-mind for me: the runtime declares collection-qualified slots to index via https://github.com/microbiomedata/nmdc-runtime/blob/c4c4a8d08f88c7fed71d693c7d45c7cea4854db9/nmdc_runtime/api/models/util.py#L85, which feeds https://github.com/microbiomedata/nmdc-runtime/blob/c4c4a8d08f88c7fed71d693c7d45c7cea4854db9/nmdc_runtime/api/main.py#L351 on runtime api init: https://github.com/microbiomedata/nmdc-runtime/blob/c4c4a8d08f88c7fed71d693c7d45c7cea4854db9/nmdc_runtime/api/main.py#L390
The
functional_annotation_agg
collection (as of when schema11.0.3
is in effect) has a compound index on the pair of fields: "gene_function_id
,metagenome_annotation_id
". I don't see code in any of our GitHub repos (I searched across all repos in our org) that creates that index, so I'm assuming someone created it manually at some point.This led me to wonder (things like):
mongosh
) or always via the Runtime?I know the Runtime creates some; for example:
https://github.com/microbiomedata/nmdc-runtime/blob/17cf31332aee0852d20137aec7b8b2d3398caed0/nmdc_runtime/api/main.py#L299
https://github.com/microbiomedata/nmdc-runtime/blob/17cf31332aee0852d20137aec7b8b2d3398caed0/nmdc_runtime/site/ops.py#L1132
Tasks
Note: The specific index I mentioned above will cease to exist within a few days (as part of the migration from schema 11.0.3 to 11.1.0). I'm using it here to exemplify a concept.