hubmapconsortium / portal-ui

HuBMAP Data Portal front end
https://portal.hubmapconsortium.org
MIT License
12 stars 2 forks source link

Prov graph duplicate donor nodes #2891

Open john-conroy opened 2 years ago

john-conroy commented 2 years ago

I'm not sure if this is a Portal question or a Neo4J graph structure question; I've created a multi-tissue-sample dataset on DEV at https://portal.dev.hubmapconsortium.org/browse/dataset/3a1ead78ec5b4ebc9d4745e72d69fe97 . If you look at the graph representation of the provenance, at the far left edge you'll see that the 8 samples come from 2 identical root nodes. Both represent the same donor. Is this an artifact of the 2 sets of 4 samples each having been created at different times? Are there two distinct nodes in Neo4J representing that single donor?

image (3)

john-conroy commented 2 years ago

Bill does not believe it is an issue with the provenance api.

shirey commented 1 year ago

@john-conroy This is happening with samples/organs too. Reported by Stanford TMC, I traced these yesterday and I believe the entity-api GET /entities/provenance/<entity uuid> endpoint is working correctly. The provenance for datasets HBM344.KMPF.842 and HBM956.QXTL.386 show four organ level samples for each organ that is supposed to be shown, but should only show one for each (in this example 2 total organs, but 8 total instances are shown in the provenance graph). From the database this is what the provenance should look like:

image

There is a similar registration that works for organ (broken for Donor, though), HBM672.CLPB.355 Database diagram here:

image

Notice that in this one where the organ/sample level is correct that there is only one Activity (little red dots) downstream (to the right) of the organs which create the organ pieces, where in the broken graph above there is one activity per organ_piece. This seems like what is causing the issue, the code that renders the graph in the portal is creating a node per activity upstream. Of note both of these models are correct as you can have one activity that creates multiple tissue pieces or multiple activities each creating one (or more) tissue pieces.