jhu-bids / TermHub

Web app and CLI tools for working with biomedical terminologies. https://github.com/orgs/jhu-bids/projects/9/views/7
https://bit.ly/termhub
GNU General Public License v3.0
11 stars 10 forks source link

DB refresh: Updated objects (i.e. metadata) #398

Open joeflack4 opened 1 year ago

joeflack4 commented 1 year ago

Overview

Currently the DB refresh is set up so that any new created objects created since a certain time will be fetched. However, data about any updates / archivals (I don't think) / deletions of objects will not be fetched.

Since these objects are mostly immutable, I think this only applies to metadata.

Involves the following objects

When to update metadata

a. Query the API on lastUpdated b. We could also update container metadata whenever we fetch a new cset version. That's be a convenient time.

Sub-tasks

Sigfried commented 1 year ago

Solve by changing fetch since queries to compare to modifiedAt instead of createdAt

joeflack4 commented 1 year ago

Thanks Siggie. That is the most important thing. Other logic will need to be changed too. If I change that one line and do nothing else, it will try to insert data as if it is a new record rather than doing an update. Shouldn't require that many changes in the logic, I don't think.

Sigfried commented 1 year ago

@joeflack4, where does this stand? Is it done?

And can you confirm, I think we decided to continue checking createdAt in addition to modifiedAt. Is that happening?

joeflack4 commented 1 year ago

@Sigfried No, I haven't started on this yet. This is the only/most critical DB refresh issue after fixing the vocab/counts refresh GH action running out of memory, and I aim to do this immediately after that.

We will need to continue checking both createdAt and modifiedAt. Main reason for this is that the latter is often NULL. Ideally it would be >= createdAt.