hubmapconsortium / entity-api

A set of web service calls to return information about HuBMAP entities
https://entity.api.hubmapconsortium.org
MIT License
3 stars 1 forks source link

[Troubleshooting] Dataset status didn't update as expected #613

Open yuanzhou opened 5 months ago

yuanzhou commented 5 months ago

Reported by Sunset on 2/5/2024 on PROD:

we had some issues with the multiAssay upload split step, for some reason, we still have the bug about statuses not being updated on the database (but entity-api does sends a successful response), here is a small snip of the log where you can find that we send the status update to Submitted on a dataset and a couple of minutes after that we query the dataset and the status was set back to New (not actually updated as New rather that the Submitted update never happened as shown in the status_history field):

[2024-02-02T17:14:32.273-0500] {logging_mixin.py:137} INFO - Setting status of dataset 9e84f7c2f4c473a7337db6cb9c6f9372 to Submitted
[2024-02-02T17:14:32.273-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:35.438-0500] {status_manager.py:195} INFO - COMPILED DATA: {'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:35.439-0500] {status_manager.py:209} INFO -
data:
{'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:35.439-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:36.424-0500] {logging_mixin.py:137} INFO - Reorganized new: 9e84f7c2f4c473a7337db6cb9c6f9372 from Upload: f080d45619badd505a45b990ea1f9a92 status is Submitted
[2024-02-02T17:15:03.867-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:15:03.931-0500] {logging_mixin.py:137} INFO - ds rslt:
[2024-02-02T17:15:03.932-0500] {logging_mixin.py:137} INFO - {'contains_human_genetic_sequences': True,
'created_by_user_displayname': 'HuBMAP Process',
....
'hubmap_id': 'HBM875.SGLH.395',
'last_modified_timestamp': 1706911891972,
'last_modified_user_displayname': 'HuBMAP Process',
'last_modified_user_email': '[hubmap@hubmapconsortium.org](mailto:hubmap@hubmapconsortium.org)',
'last_modified_user_sub': '3e7bce63-129d-33d0-8f6c-834b34cd382e',
'local_directory_rel_path': 'protected/TMC - University of '
'Pennsylvania/9e84f7c2f4c473a7337db6cb9c6f9372/',
'provider_info': 'Penn-Visium : ./Visium_13_75_S2_b',
'status': 'New',
'status_history': [{'change_timestamp': 1706911891972,
'changed_by_email': '[hubmap@hubmapconsortium.org](mailto:hubmap@hubmapconsortium.org)',
'status': 'New'}],
[2024-02-02T17:14:07.985-0500] {logging_mixin.py:137} INFO - Setting status of dataset fa7fbcd8ae9219225f5df25e8c5e994e to Submitted
[2024-02-02T17:14:07.985-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:08.053-0500] {status_manager.py:195} INFO - COMPILED DATA: {'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:08.054-0500] {status_manager.py:209} INFO -
data:
{'status': <Statuses.DATASET_SUBMITTED: 'submitted'>}
[2024-02-02T17:14:08.054-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:14:09.634-0500] {logging_mixin.py:137} INFO - Reorganized new: fa7fbcd8ae9219225f5df25e8c5e994e from Upload: f080d45619badd505a45b990ea1f9a92 status is Submitted
[2024-02-02T17:15:01.465-0500] {base.py:73} INFO - Using connection ID 'entity_api_connection' for task execution.
[2024-02-02T17:15:01.517-0500] {logging_mixin.py:137} INFO - ds rslt:
[2024-02-02T17:15:01.518-0500] {logging_mixin.py:137} INFO - {'contains_human_genetic_sequences': True,
'created_by_user_displayname': 'HuBMAP Process',
....
'hubmap_id': 'HBM859.FRLZ.427',
'last_modified_timestamp': 1706911874747,
'last_modified_user_displayname': 'HuBMAP Process',
'last_modified_user_email': '[hubmap@hubmapconsortium.org](mailto:hubmap@hubmapconsortium.org)',
'last_modified_user_sub': '3e7bce63-129d-33d0-8f6c-834b34cd382e',
'local_directory_rel_path': 'protected/TMC - University of '
'Pennsylvania/fa7fbcd8ae9219225f5df25e8c5e994e/',
'provider_info': 'Penn-Visium : ./Visium_13_44_S1_b',
'status': 'New',
'status_history': [{'change_timestamp': 1706911874747,
'changed_by_email': '[hubmap@hubmapconsortium.org](mailto:hubmap@hubmapconsortium.org)',
'status': 'New'}],
yuanzhou commented 5 months ago

@sunset666 can you list the API calls made against entity-api and ingest-api in sequence?

yuanzhou commented 4 months ago

From Sunset:

First one is a simple PUT call to entity API, endpoint /entities/<uuid> json: {'status': 'submitted'} There is a for cycle through all the uuids of an upload (for these particular runs one with 83 and another one with 102) After that is complete, on another step (not immediately after). It is a GET call to the same endpoint, for the same list of uuids

yuanzhou commented 4 months ago

2/16/2024, tested on PROD using an existing IEC Testing dataset d0ebafcbe06c9fa2cb739c4a860e79bd with "New" status, changed to "Submitted" and back to "New". Can NOT reproduce, and each status update deleted the old cache.

[2024-02-16 05:07:04] INFO in schema_manager: Deleted cache by key: hm_entity_prod__neo4j_d0ebafcbe06c9fa2cb739c4a860e79bd, hm_entity_prod__complete_d0ebafcbe06c9fa2cb739c4a860e79bd

Made a GET call right after this PUT call, showed updated status "Submitted"

[2024-02-16 05:07:34] INFO in schema_manager: Deleted cache by key: hm_entity_prod__neo4j_d0ebafcbe06c9fa2cb739c4a860e79bd, hm_entity_prod__complete_d0ebafcbe06c9fa2cb739c4a860e79bd

Made another GET call right after this PUT call, also showed updated status "New"
"status": "New",
"status_history": [
{
    "change_timestamp": 1708060024775,
    "changed_by_email": "hubmap@hubmapconsortium.org",
    "status": "Submitted"
},
{
    "change_timestamp": 1708060054425,
    "changed_by_email": "hubmap@hubmapconsortium.org",
    "status": "New"
}
]
yuanzhou commented 4 months ago

@sunset666 the response of the PUT call looks like below:

{
    "message": "Dataset of d0ebafcbe06c9fa2cb739c4a860e79bd has been updated"
}

Can you add this response to the pipeline logging? And also log when the next GET call is made just in case the pipeline doesn't actually make a new GET call after the PUT call or the ds rslt output logging is from a previous result prior to the PUT call. This would help us further debug and find the root case.

sunset666 commented 4 months ago

@yuanzhou, We are adding "Cache-Control: no-cache" header to the request, just to test if that is something that will fix this issue.

yuanzhou commented 4 months ago

Cache-Control: no-cache" header to the request

Not sure if that'll make a difference, definitely worth to add more loggings. The entity-api has caching mechanism internally but it removes the old entity cache after each PUT update.

yuanzhou commented 4 months ago

3/4/2024 from Sunset:

Still no data to test those updates.

shirey commented 4 months ago

Waiting for a way to replicate from @sunset666