hubmapconsortium / search-api

HuBMAP search service and associated pieces to create an index
https://search.api.hubmapconsortium.org
MIT License
2 stars 2 forks source link

Metadata should not be missing. #526

Closed mccalluc closed 2 years ago

mccalluc commented 2 years ago

Currently, https://portal.hubmapconsortium.org/browse/dataset/d76ee69cff161f4605f9d65721f41f5d fails with a 500. In the logs:

2022-06-15T17:13:06.843438348Z [2022-06-15 17:13:06,834] ERROR in app: Exception on /browse/dataset/d76ee69cff161f4605f9d65721f41f5d [GET]
2022-06-15T17:13:06.843504450Z Traceback (most recent call last):
2022-06-15T17:13:06.843510662Z   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2070, in wsgi_app
2022-06-15T17:13:06.843513838Z     response = self.full_dispatch_request()
2022-06-15T17:13:06.843516428Z   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1515, in full_dispatch_request
2022-06-15T17:13:06.843519184Z     rv = self.handle_user_exception(e)
2022-06-15T17:13:06.843521722Z   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1513, in full_dispatch_request
2022-06-15T17:13:06.843524367Z     rv = self.dispatch_request()
2022-06-15T17:13:06.843526816Z   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1499, in dispatch_request
2022-06-15T17:13:06.843530660Z     return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
2022-06-15T17:13:06.844253228Z   File "/app/./app/routes_browse.py", line 59, in details
2022-06-15T17:13:06.844263218Z     conf_cells_uuid = client.get_vitessce_conf_cells_and_lifted_uuid(entity)
2022-06-15T17:13:06.844266232Z   File "/app/./app/api/client.py", line 166, in get_vitessce_conf_cells_and_lifted_uuid
2022-06-15T17:13:06.844268914Z     derived_entity['files'] = derived_entity['metadata']['files']
2022-06-15T17:13:06.844271453Z KeyError: 'metadata'

To me, this seems to be analogous to the issue behind https://github.com/hubmapconsortium/portal-ui/pull/2147: In that case we decided that it was a data problem, and not a case the front-end needs to handle.

jswelling commented 2 years ago

This problem is occurring for a dataset which references a vis-lifted ome-tiff pyramid derived dataset. The primary dataset, d76ee69cff161f4605f9d65721f41f5d, is in QA but the ome-tiff pyramid derived dataset, e0654875a2ba5f7eb23bc3302c4bcbfd, is still being computed. Of a group of 6 similar Lightsheet datasets ingested in the past 2 days, only this one is still incomplete and only this one displays this problem. Thus I think the problem is transient and the issue will not be reproducible in a few hours when the derived dataset finishes.

mccalluc commented 2 years ago

Bill:

I’ll look into this further tomorrow AM. What does the Portal UI do now in the case where a dataset in the “New” state doesn’t have any metadata because it hasn’t been ingested yet?

My preference is for stable, reliable document structures: Every element that may or may not be present is another special case any client would need to cover.

But if you are unable to provide this, we can add a special case here in portal-ui if metadata is missing.

mccalluc commented 2 years ago

@shirey -- The linked document works now. Can you explain what changed and close this issue?

shirey commented 2 years ago

@mccalluc Nothing at all changed that I know of. This is on my list of things to investigate further, but haven't gotten to it yet- I thought it was still broken. The bit of investigation that I did do last week, it looked like the metadata field was available in both Neo4j and ES, but it wasn't clear to me that it was formatted correctly.

shirey commented 2 years ago

The metadata was missing because of an errored and incomplete pipeline run. Once the pipeline was fixed an rerun the metadata was back in place.