Closed eecavanna closed 2 weeks ago
@aclum, is this the type of HTTP response you had in mind for the endpoint that we discussed; the endpoint that, for a given id
, determines the collection—if any—in which a document having that id
would reside, and determines the schema class—if any—of which an instance would have that as its id
?
{
"id": "nmdc:sty-10-123456abcdef",
"collection_name": "study_set",
"class_name": "Study",
}
Note: The value of the
"id"
property in the HTTP response would be a verbatim copy of the one in the HTTP request. It's only included in the response for the convenience of the client.
CC: @sujaypatil96
Yes. I think this would also be useful for the data portal, I have a ticket that has been in for a while to be able to search by an nmdc identifier. https://github.com/microbiomedata/nmdc-server/issues/964 cc @naglepuff @marySalvi @jeffbaumes
What is expected for an id
that looks like e.g. nmdc:clsite-99-123abc
? From /nmdcschema/typecodes
, clsite
is the typecode for a CollectingBiosamplesFromSite
object, but this can be in either the collecting_biosamples_from_site_set
(https://github.com/microbiomedata/nmdc-schema/blob/v10.5.3/nmdc_schema/nmdc_materialized_patterns.schema.json#L3531) or planned_process_set
(https://github.com/microbiomedata/nmdc-schema/blob/v10.5.3/nmdc_schema/nmdc_materialized_patterns.schema.json#L3653) slots, according to the schema. So in this case, both collection names should be returned, no?
I have updated the PR to reflect this and return a list of collections.
The issue of being valid for multiple collections is fixed in berkeley schema. The original feature of this endpoint was to return which collection the ID is in so if it is ambiguous for now it should check which collection the document actually resides in.
The original feature of this endpoint was to return which collection the ID is in so if it is ambiguous for now it should check which collection the document actually resides in.
This endpoint only uses the schema, not the database. The class_name
it returns is the name of the schema class, if any, for which the specified value would be a valid id
(on an instance of that class). The collection_names
list it returns is a list of the names of all the "collections" (technically, the names of slots of the Database
schema class), if any, that could contain a document having that value as its id
.
In other words, it currently doesn't do this:
if it is ambiguous for now it should check which collection the document actually resides in.
Instead, it returns a list of all collections that — according to the schema — a document having that id
could reside in.
CC: @aclum
Sounds to me like one of the requirements for this endpoint is to also indicate which collection, if any, a document having the specified id
resides in.
Sounds to me like one of the requirements for this endpoint is to also indicate which collection, if any, a document having the specified id resides in.
Update: I have added this feature to the endpoint.
Here's the response shape:
{
"id": "string", // `id` from the URL (the `hypothetical_doc_id` path parameter)
"compatible_class_name": "string", // name of the class of which an instance _could_ have that `id`
"compatible_collection_names": ["string"], // names of all collections that _could_ contain a document having that `id`
"containing_collection_name": "string" // name of the collection in which a document having that `id` _does_ exist, if any
}
Example response:
{
"id": "nmdc:sty-1-foobar",
"compatible_class_name": "Study",
"compatible_collection_names": ["study_set"],
"containing_collection_name": "study_set"
}
I am ready for this PR branch to be reviewed/merged.
Hi @aclum, are you OK with how this endpoint behaves? If so (and someone approves it via GitHub's Review mechanism), I'll merge it in. I accidentally invalidated @sujaypatil96's approval by making an additional commit.
@sujaypatil96 added screenshots of some example request/response pairs above—thanks, @sujaypatil96!
@eecavanna You're good to merge. If/when there's time, I'd like to consolidate and replace these utils with your logic. (non-blocking):
nmdc_runtime.api.core.metadata.map_id_to_collection
nmdc_runtime.util.collection_name_to_class_names
nmdc_runtime.api.db.mongo.nmdc_schema_collection_names
Based on @aclum's feedback during today's infrastructure meeting, I will update the endpoint as follows:
id
exists in any of the collections the schema says it can exist in, return the name of that collection. If it happens to exist in multiple collections, only return the name of one collection that it exists in.id
exists in any of the collections the schema says it can exist in, return a "null-ish" response (TBD).Sorry about all the thrash here! Getting the automated tests running locally was not happening smoothly for me (I think one of the tests hung last night), so I've been relying on the GHA workflow to run them for me. A downside of that is that the test failures generate notifications—at least to me.
Hi @aclum, I updated the API response based upon our conversation earlier today.
The new behavior is:
Scenario 1: User specifies the id
of nmdc:sty-1-foobar
to the API. A document having that id
exists in the study_set
collection. The API responds with:
{
"id": "nmdc:sty-1-foobar",
"collection_name": "study_set",
}
Scenario 2: User specifies the id
of nmdc:sty-1-nonex
to the API. A document having that id
does not exist in any collections (that the schema says it can exist in). The API responds with a HTTP 404 Not Found
response.
I'm ready for this PR branch to be reviewed/merged in.
Description
In this branch, I implemented a new API endpoint. Its behavior is described in its docstring, shown here:
Fixes https://github.com/microbiomedata/nmdc-runtime/issues/531
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration, if it is not simply
make up-test && make test-run
.Configuration Details: none
Checklist:
black nmdc_runtime/
?)docs/
and in https://github.com/microbiomedata/NMDC_documentation/?)make up-test && make test-run
)