monarch-initiative / icd11

ICD11 foundation ingest.
1 stars 1 forks source link

ICD11: Syncing with latest source from WHO #8

Open joeflack4 opened 3 months ago

joeflack4 commented 3 months ago

Overview

The ICD11 ingest currently uses a source file from 2023-04-08. It would be good to get a newer / latest / versionless dump, ideally a stable URL.

Possible solutions

We have heard now that there is no stable, unchanging URL for latest releases. However, we did learn that there is a stable URL / date pattern.

Can wrote:

The OWL for latest release is here: https://icd11files.blob.core.windows.net/foundationowl/whofic-2024-01-21.owl.gz and the latest daily version is https://icd11files.blob.core.windows.net/foundationowl/whofic-2024-05-18.owl.gz Every Saturday a new version is created with the same naming pattern

There are different problems with different possible solutions.

Development vs production

a. Use the 'daily' (uploaded every Saturday), development version instead of the release/production

b. Create a whole other pipeline; maintain 2 pipelines: 1 for release, and 1 for development

c. Continue to use production I think this, given that Can wrote:

The content in the daily version may contain thing that may change be removed. It may contain errors, etc. So using the release would be safer. We have one production release per year.

Getting the latest production release

Can wrote:

We have one production release per year.

If there is a web page that gets updated whenever the release happens, we can check that whenever our build runs and see if it lists the URL (or date, if that's trustworthy) where this happened, and then fetch the URL w/ that date pattern to get the release.

Otherwise, and fine for now, if we know generally when this is released each year, we can set a reminder / calendar event to check-in with WHO yearly around that time.

I haven't gotten confirmation

Getting the latest development release (if needed)

We would have some logic that would iterate through the date pattern, (a) checking every Saturday, backwards, until it found the latest release, just in case they missed a week. That way it the pipeline wouldn't fail. Or, better, (b), check every date, going backwards from today's date, until hitting a URL that works.

Additional info

Context: original discussion Nico wrote:

I ... can make sure that they provide us with a version-less dump - after we are a good way in the syncing process.

I (Joe) have set a yearly calendar reminder to check for updates in late January.

Related

joeflack4 commented 1 month ago

I have a solution for this I mentioned previously somewhere else. And now, based on what Can said on Slack, I am confident that I have a solution in place. I've added these possible solutions to the OP.