hubmapconsortium / portal-ui

HuBMAP Data Portal front end
https://portal.hubmapconsortium.org
MIT License
12 stars 2 forks source link

triage: Provenance for GE Cell DIVE datasets show no sample and samples show no datasets #1788

Closed pecan88 closed 3 years ago

pecan88 commented 3 years ago

Fill in as much as seems useful:

Describe the issue When I am logged in and accessing the list of GE Cell DIVE datasets and select one, the provenance graph displays only the donor and no sample.

When I am logged in and accessing the list of GE Cell DIVE related tissue samples datasets and select one, the provenance graph displays only the donor and not the dataset.

To Reproduce

  1. Go to here: https://portal.hubmapconsortium.org/browse/sample/a4c7b437e1e74ee5fd3ef2bb4fffa67c

  2. Look at provenance graph

  3. See no derived datasets displayed

  4. Select donor, select derived datasets

  5. See derived datasets displayed

  6. Go to here: https://portal.hubmapconsortium.org/browse/dataset/060dfa0fdf2b840864f62d2cd1a7a456

  7. Look at provenance graph

  8. See no sample linked to dataset

Expected behavior I would see a connected chain of Donor > Sample > Organ relationships

Screenshots image image

Environment

Best guesses

mccalluc commented 3 years ago

I believe the UI rendering is correct, with the information it has been given. I know that there has been some discussion about the provenance for GE datasets... but all I remember was that the metadata was sparse, not that the provenance DAG had an unusual structure. Looping in @shirey to see if there's backend work here...

shirey commented 3 years ago

@pecan88 It looks like the data was registered directly against a Donor instead of against tissue sample(s). Pink node is the donor, red nodes are tissue (organ, ffpe-block, ffpe-slides) and the blue node is the data.

image

mccalluc commented 3 years ago

Bill is not involved direct conversations with data providers about how they want their data structured... but when it's clear what the request is, he can go in and re-wire the data. I will talk with Chris and see if she is the best person to carry this forward.

(This is not a code issue, but keeping it open until it is fixed in the portal, or a different place to track it has been identified.)

ngehlenborg commented 3 years ago

Not sure if that matters in any way, but the Globus folder associated with dataset https://portal.hubmapconsortium.org/browse/dataset/060dfa0fdf2b840864f62d2cd1a7a456 (HBM732.FZVZ.656) is empty.

mccalluc commented 3 years ago

Update from Chris

I connected @shirey directly with Fiona & Liz at GE so the provenance issue is getting resolved.

(Still keeping this open until fixed in portal, or alternate tracking is available.)

pecan88 commented 3 years ago

@mccalluc GE has gone through and updated much of the provenance. I noted one lack of linking between a sample and a dataset and asked GE's project manager to review the sample to dataset linkages to make sure we aren't missing any others ...

mccalluc commented 3 years ago

@shirey : The dataset now has the intermediate samples, but the sample does not show derived datasets... It's possible this is the case because there are none. Do you know?

@pecan88 : Do you have reason to believe there are datasets which are derived from https://portal.hubmapconsortium.org/browse/sample/a4c7b437e1e74ee5fd3ef2bb4fffa67c ?

pecan88 commented 3 years ago

@mccalluc - yes, there have been no HIVE based processing for the datasets so there will not (yet) be derived datasets. The decision at TC-CMU & RTI-GE about 2-3 weeks back was to seek to have the pipeline ready for testing by May 14 - currently it is pending. (I'll close this issue unless you see reason otherwise for it to remain open.)