opensafely-core / opencodelists

OpenCodelists is an open platform for creating and sharing codelists of clinical terms and drugs.
https://www.opencodelists.org
Other
31 stars 9 forks source link

Update BNF data #1960

Closed rebkwok closed 2 months ago

rebkwok commented 2 months ago

There is a new release of BNF (2024-05-02) (opencodelists is currently on 2024-01-01). Also, it looks like 2024-01-01 may have been labelled with the wrong version number. According to the download site, 2024-01-01 is version 86, not 84.

(I came across a discrepancy between opencodelists and openprescribing BNF codes, doesn't look like it's due to changes in the latest data, but it meant i went to check if there was a new update)

Jongmassey commented 2 months ago

I'm not seeing 2024-05-02 on the download site image

iaindillingham commented 2 months ago

Hi @rebkwok. Like Jon, I logged into the download site (as a guest). The most recent version is dated 2024-01-01. Where did you find out about 2024-05-02?

From your description of the issue, I think you're suggesting relabelling 2024-01-01 from 84 to 86. Is that correct? I've not worked with OpenCodelists before. If that is correct, then how can I relabel safely?

rebkwok commented 2 months ago

You need to download the file in order to see if there's an update. They're only displayed in the downloaded filename. (described in the BNF README, but it's not very visible)

Jongmassey commented 2 months ago

Wow - they make it so easy!

iaindillingham commented 2 months ago

Could I check what I think I'm going to do with you before I do it, @rebkwok?

Based on the BNF README, I think I'm going to copy 20240502_1714657404842_BNF_Code_Information.zip to /var/lib/dokku/data/storage/opencodelists/data/bnf/ on dokku3. Then I think I'm going to run:

./manage.py import_coding_system_data bnf 20240502_1714657404842_BNF_Code_Information.zip --release "01-01-2024: 86" --valid-from 2024-05-02

Then I think I'm going to run:

dokku ps:restart opencodelists

At this stage, I think I've updated the BNF data.

I'm unsure what to do about relabelling 2024-01-01 from 84 to 86. I think this information is stored within CodingSystemRelease.release_name. So, I think it's as simple as:

release = CodingSystemRelease.objects.get(release_name="84 (2024-01-01)")
release.release_name = "86 (2024-01-01)"
release.save()

But that raises:

AssertionError: database_alias bnf_84-2024-01-01_20240101 does not follow required pattern (expected 'bnf_86-2024-01-01_20240101'

So, I think it's as simple as:

release.database_alias = "bnf_86-2024-01-01_20240101"
release.save()

Am I on the right track?

rebkwok commented 2 months ago

To keep the release name for BNF consistent, I'd call the new one "86 (2024-05-02)" for the new BNF data.

rebkwok commented 2 months ago

So, I think it's as simple as: release.database_alias = "bnf_86-2024-01-01_20240101" release.save()

Not quite; you also need to rename the actual database file.

The database alias is created using the coding system, release name and valid from date on the CodingSystemRelease object:

database_alias = slugify(
            f"{self.coding_system}_{self.release_name}_{self.valid_from.strftime('%Y%m%d')}"
        )

It's like that because when you import the data, that's how the sqlite db file is named. You can force it to change by updating both the release_name and database_alias in the same save, assuming the release_name you set slugifies to the database_alias you're trying to update to.

However, it'll now use that database_alias as the using argument any time you access the BNF data for that release, and it'll fail to connect to the database - so you also need to rename/make a copy of the bnf_84_2024-01-01_20240101.sqlite3 file and call it bnf_86-2024-01-01_20240101.sqlite3 (in /var/lib/dokku/data/storage/opencodelists/coding_systems/bnf).

Then you'll need to again run:

dokku ps:restart opencodelists
iaindillingham commented 2 months ago

Thanks, @rebkwok.

I'm still unsure about relabelling. Perhaps I could repeat back what I think I know, and you could correct me?

We have:

/var/lib/dokku/data/storage/opencodelists/coding_systems/bnf/bnf_84-2024-01-01_20240101.sqlite3
release = CodingSystemRelease.objects.get(release_name="84 (2024-01-01)")
release.release_name  # '84 (2024-01-01)'
release.database_alias  # 'bnf_84-2024-01-01_20240101'

I think we'd like to have:

/var/lib/dokku/data/storage/opencodelists/coding_systems/bnf/bnf_86-2024-01-01_20240101.sqlite3
release = CodingSystemRelease.objects.get(release_name="86 (2024-01-01)")
release.release_name  # '86 (2024-01-01)'
release.database_alias  # 'bnf_86-2024-01-01_20240101'

You say that:

database_alias = slugify(
    f"{release.coding_system}_{release.release_name}_{release.valid_from.strftime('%Y%m%d')}"
)

However, that doesn't match 'bnf_84-2024-01-01_20240101'. Do you know why?


As an aside, I updated the BNF data:

./manage.py import_coding_system_data bnf /storage/data/bnf/20240502_1714657404842_BNF_Code_Information.zip --release "02-05-2024: 86" --valid-from 2024-05-02

However, I don't see a corresponding SQLite database. That is, I don't see:

/var/lib/dokku/data/storage/opencodelists/coding_systems/bnf/bnf_86-2024-05-02_20240502.sqlite3

Do you know why?

rebkwok commented 2 months ago

Did the bnf import succeed? The output of the manage command should be quite verbose. If you called the release "02-05-2024: 86", the db name won't be "bnf_86-2024-05-02_20240502.sqlite3", it'll be "bnf_02-05-2024-86_20240502.sqlite3" (or something similar)

iaindillingham commented 2 months ago

Thanks, @rebkwok. I realised my error: I didn't understand how releases are named. I thought values of CodingSystemRelease.release_name were constructed by parsing values of --release. That's not the case.

I have, unfortunately, created a second poorly named release. I'm planning to remedy this on dokku3 as follows:

release = CodingSystemRelease.objects.get(release_name="02-05-2024: 86")
release.release_name = "86 (2024-05-02)"
release.database_alias = "bnf_86-2024-05-02_20240502"
release.save()
mv bnf_02-05-2024-86_20240502.sqlite3 bnf_86-2024-05-02_20240502.sqlite3

I'm planning to do the same for the first poorly named release as follows:

release = CodingSystemRelease.objects.get(release_name="84 (2024-01-01)")
release.release_name = "86 (2024-01-01)"
release.database_alias = "bnf_86-2024-01-01_20240101"
release.save()
mv bnf_84-2024-01-01_20240101.sqlite3 bnf_86-2024-01-01_20240101.sqlite3

Peter pointed out that CodelistVersion points to CodingSystemRelease. However, I don't think I need to update any instances of CodelistVersion, because I'm updating existing codelist versions and not deleting old and creating new codelist versions.

iaindillingham commented 2 months ago

I've made the above changes and am going to close this issue. See also #1975.