hubmapconsortium / search-api

HuBMAP search service and associated pieces to create an index
https://search.api.hubmapconsortium.org
MIT License
2 stars 2 forks source link

Cascade dataset update in Collection and Upload #840

Closed yuanzhou closed 3 months ago

yuanzhou commented 3 months ago

It took a process of trial and error to realize that direct update against Elasticsearch causes lots of 409 version conflicts in our current model, when multiple entities under the same ancestors (Donor, Sample...) are being updated in parallel. So we have to go back to the original procedure to delete the old doc first then reindex it by calculating all the runtime fields.

However, we'll still need to re-implement the dataset-specific pieces. When a dataset gets updated/reindexed, we'll also need to update its associated Collection and Upload to reflect this new version of dataset. This is more about data integrity rather than efficiency. We will NOT use the _update to directly update against the existing Collection and Upload, otherwise we'll still run into the 409 conflicts when multiple datasets belong to the same Collection or Upload get updated at the same time.