Closed aclum closed 4 months ago
Thanks @aclum. I'll get it on our project board
@aclum I tried to reproduce your submission and am getting a 500 internal server error. Other endpoints on api-dev are not returning 500s. Looking into why I'm not able to reproduce.
Here is where the post request is defined: https://github.com/microbiomedata/nmdc-runtime/blob/682a53c2d6e453da0e5d5a210ba87afb29be2bf7/nmdc_runtime/api/v1/workflows/activities.py#L26
Here is where the activity records are getting created and inserted into mongodb: https://github.com/microbiomedata/nmdc-runtime/blob/682a53c2d6e453da0e5d5a210ba87afb29be2bf7/nmdc_runtime/api/v1/workflows/activities.py#L26
I believe this is the code that is inserting records with null values into mongo.
def insert_activities(activities: Database, mdb: MongoDatabase) -> bool:
"""Description."""
activity_fields = fields(activities)
for field in activity_fields:
if activities[field.name]:
collection = mdb.get_collection(field.name)
collection.insert_many(
[
json.loads(json.dumps(asdict(activity), default=str))
for activity in activities[field.name]
]
)
return True
@aclum @shreddd Our plan is to deprecate this v1 endpoint and replace it with a refactored endpoint that uses the logic underlying json:submit
.
In the meantime, please use dagster to run a submission job
fysa @dwinston
What is the depreciation plan/timeline? Workflow automation uses this endpoint routinely they'll need time to make updates and using the dagster UI is not desirable.
@aclum We plan to push a fix this afternoon or latest by tomorrow
Great, thanks so much.
@aclum we have a minimally-tested working solution on this branch: https://github.com/microbiomedata/nmdc-runtime/tree/478-refactor-v1-workflows-activities-endpoint
There's still some clean-up and more robust testing that I'd like to do in the next few hours, but if it's urgent we can push this into main and deploy to dev
Feel free to test it and clean it up! We can wait until tomorrow.
@dwinston I would like to move the relevant code from /nmdc_runtime/api/v1/workflows/activities.py
into nmdc_runtime/api/models/workflow.py
and delete the entire v1
sub-directory.
I went through the files in v1 and did not find anything that is currently in use. Any objections or concerns for moving the new endpoint from /v1/workflows/activities
to /workflows/activities
and deprecating (ie deleting) v1
?
@PeopleMakeCulture please ensure POST /v1/workflows/activities
works without error -- maybe post a deprecation notice in its docstring. We need to get OKs from @Michal-Babins et al. to ensure transition of their client code, which we don't want to break yet.
you can ensure this however you feel is prudent, e.g. not having a v1
folder in the repo but still having it in a URL path for the endpoint.
So in the future, all posts to the DB should go through json:submit?
@Michal-Babins https://github.com/Michal-Babins At some point we'd move to /workflows/activities, although you raise a good point, if these endpoints are redundant perhaps longer term one can be deprecated entirely. For now continue to use /v1/workflows/activities. We can test the fix tomorrow.
On Tue, Feb 13, 2024 at 2:03 PM Michal Babinski @.***> wrote:
So in the future, all posts to the DB should go through json:submit?
— Reply to this email directly, view it on GitHub https://github.com/microbiomedata/nmdc-runtime/issues/478#issuecomment-1942715820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6RD3YE4MI334ZIBH7MWLDYTPPJNAVCNFSM6AAAAABDFSCT52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSG4YTKOBSGA . You are receiving this because you were mentioned.Message ID: @.***>
Here is the current plan:
Update v1/workflows/activities
to use the json_validate
utility underlying json:submit
. This should not affect downstream users at all. Downstream applications will still call the API endpoint exactly as before, and the API will return the same response as before. This fix is only intended to modify the documents inserted into Mongo such that new documents do not contain null values.
This step will be completed by #479 and merged into api-dev
later today.
Create a new endpoint workflows/activities
. Currently, this endpoint mimics the behavior of v1/workflows/activities
.
We are considering an update to the return request such that successful POST
requests return the json values inserted. This would mimic the behavior of endpoints such as POST: /objects
or POST: /metadata/changesheets:submit
. Feedback will be solicited at weekly check-in.
v1/workflows/activities
. Once all downstream jobs are full migrated to the new workflows/activities
endpoint, the v1 endpoint will be removed from the runtime API.
For the time being we will maintain both endpoints. Ideally I would like to push the new endpoint in this release cycle and deprecate v1 endpoints in the next release cycle.
@aclum @Michal-Babins @dwinston does that sound okay?
The plan sounds good to me. I think we could consolidate to json:submit
, assuming we refactor it to make it synchronous (which we can do), but there may be preference among folks to retain a dedicated endpoint with a convenient name.
Yeah, either is fine with me. I am also happy to meet and go over the use and purpose of this endpoint, and seeing if it's easier to consolidate to json:submit
. For us it would be a quick refactor for what endpoint we need to use, so I would go for whatever would be easier to maintain.
@dwinston @Michal-Babins which endpoints would we consolidate into json:submit
? Just workflows/activities
or others as well?
I think that's the only one
@aclum @Michal-Babins This fix has been pushed to prod. Runtime API is now on version 1.3.1 with the updated /workflows/activities
and v1/workflows/activities
endpoints.
Release notes in Slack here: https://nmdc-group.slack.com/archives/C063UHW5AJE/p1708011488437899
confirmed fixed on runtime dev API. This should also be in prod as of the 2024.2 release on Monday.
I noticed some workflow records had null values, these were keys that the workflow automation code was not using. I tested and was able to confirm with runtime dev that if I submit a record with no null or empty values, then check the document with /nmdcschema there are keys with null values or empty arrays. Based on this we believe the /v1/workflows/activites API endpoint is adding these keys and null values and the desired behavior is to leave these records as they are.
Test record for /v1/workflows/activites tested with runtime dev {"read_qc_analysis_activity_set":[{ "id": "nmdc:wfrqc-11-t0tvnp52.2", "name": "Read QC Activity for nmdc:wfrqc-11-t0tvnp52.1", "started_at_time": "2024-01-11T20:48:30.718133+00:00", "ended_at_time": "2024-01-11T21:11:44.884260+00:00", "was_informed_by": "nmdc:omprc-11-9mvz7z22", "execution_resource": "NERSC-Perlmutter", "git_url": "https://github.com/microbiomedata/ReadsQC", "has_input": [ "nmdc:dobj-11-gpthnj64" ], "has_output": [ "nmdc:dobj-11-w5dak635", "nmdc:dobj-11-g6d71n77", "nmdc:dobj-11-bds7qq03" ], "type": "nmdc:ReadQcAnalysisActivity", "part_of": [ "nmdc:omprc-11-9mvz7z22" ], "version": "v1.0.8" }]}
then calling https://api-dev.microbiomedata.org/nmdcschema/read_qc_analysis_activity_set?filter=%7B%22id%22%3A%22nmdc%3Awfrqc-11-t0tvnp52.2%22%7D&max_page_size=20 I get a response body that contains null values (key 'used' for example)
{ "resources": [ { "id": "nmdc:wfrqc-11-t0tvnp52.2", "name": "Read QC Activity for nmdc:wfrqc-11-t0tvnp52.1", "started_at_time": "2024-01-11T20:48:30.718133+00:00", "ended_at_time": "2024-01-11T21:11:44.884260+00:00", "was_informed_by": "nmdc:omprc-11-9mvz7z22", "used": null, "execution_resource": "NERSC-Perlmutter", "git_url": "https://github.com/microbiomedata/ReadsQC", "has_input": [ "nmdc:dobj-11-gpthnj64" ], "type": "nmdc:ReadQcAnalysisActivity", "has_output": [ "nmdc:dobj-11-w5dak635", "nmdc:dobj-11-g6d71n77", "nmdc:dobj-11-bds7qq03" ], "part_of": [ "nmdc:omprc-11-9mvz7z22" ], "version": "v1.0.8", "qc_status": null, "qc_comment": null, "has_failure_categorization": [], "input_read_count": null, "input_base_count": null, "output_read_count": null, "output_base_count": null, "input_read_bases": null, "output_read_bases": null } ] }
This is high priority to fix b/c this is creating invalid records, the response body json does not validate with the json:submit endpoint (see #462) cc @pkalita-lbl @shreddd @ssarrafan @Michal-Babins @mbthornton-lbl @hubin-keio