hubmapconsortium / entity-api

A set of web service calls to return information about HuBMAP entities
https://entity.api.hubmapconsortium.org
MIT License
3 stars 1 forks source link

Investigate incorrect revisions results #622

Closed yuanzhou closed 4 months ago

yuanzhou commented 4 months ago

Brought up by Nils on the Sprint review meeting 2/9/2024:

Sunset posted this link https://portal.hubmapconsortium.org/browse/dataset/6d43804a0477e8839e682611a1cf0ada, dataset rerun for annotations.

It shows version 3 as the latest version, when jump to version 2, there's a version 4 but says it's not the latest version.

Screenshot 2024-02-09 at 10 52 36 AM Screenshot 2024-02-09 at 10 52 53 AM
yuanzhou commented 4 months ago
Screenshot 2024-02-09 at 11 19 05 AM
yuanzhou commented 4 months ago

From @ngehlenborg:

My understanding is that this is the model that we had agreed on at some point:

image

Turned out that both Harvard and PSC are still consuming the "old" single revision, so no one has migrated to the new multi-revision schema even though it's supported by the API currently. Version 2 0a21f3fa27109790483f2a0729be53de got processed twice and caused two version 3 nodes. But portal-ui handling doesn't handle such cases and gets confused.

@sunset666 confirmed that

we had a full rerun for the dataset (salmon pipeline) which produced the first v3 revision, and then a new version with the new annotation pipeline. Which was created after the multi-revision support was on PROD, and now we have 2 v3 nodes (which is expected I guess). Now the question is, how are we going to handle this multi-revision support? I forgot about that part. Confirming the above, a full pipeline run and an re-annotation run happened to 0a21f3fa27109790483f2a0729be53de

yuanzhou commented 4 months ago

FYI @shirey, just a heads-up when you come back, there are questions on how to render the multi-revisions properly on the Harvard side. @ngehlenborg proposed this: image