uk-fci / nhs-number

Python package to provide utilities for NHS Numbers, including validity checks, normalisation, and generation.
https://nhs-number.uk-fci.tech/
MIT License
6 stars 6 forks source link

Shorthand organisation identification #36

Open andyatterton opened 6 months ago

andyatterton commented 6 months ago

It would be handy to be able to get a shorter label (NHS, CHI, HSC etc) that identifies the organisation responsible for the provided number when using the NhsNumber class. Currently, if I want to get this information, I would do something along the lines of

if "nhs" in my_number.region_comment.lower():
    organisation = "NHS"
if "chi" in my_number.region_comment.lower():
    organisation = "CHI"

I'd like something like

my_number = NhsNumber("4069990771")
print(my_number.organisation)
'NHS'

If you are open to this addition, I'm happy to submit a pull request.

andylaw commented 6 months ago

If you wanted to add something like that, it should be a property of the Region object associated with the number. Your proposed solution also relies on "magic" strings.

What is your use case?

andyatterton commented 6 months ago

My solution introduces a short_label to region and uses that to set the organisation attribute on the NhsNumber class. You could access it through my_number.region.short_label, but that feel unnecessarily verbose. An NHS number having an issuing organisation (now I'm writing this organisation may not be the best description, but that is how we label it internally) seems like a reasonable abstraction.

When we store these numbers in a database, we store them with a shorthand text and don't use the regions as they are labelled here. I would have thought it was more typical to store the abbreviation.

We store a patient number as follows

patient_number = PatientNumber(
        id=2,
        pid="PYTEST01:LABORDERS:00000000L",
        patientid="111111111",
        organization="NHS",
        numbertype="NI",
    )

So I can just do a check, but in cases where I don't know which number region to expect I think I would have to check each case

if my_number.region == REGION_ENGLAND_WALES_IOM:
    organization = "NHS"
if my_number.region == REGION_SCOTLAND:
    organization = "CHI"

Would be easier if an NhsNumber object came with that shorthand baked in.

pacharanero commented 6 months ago

I'm possibly interested in adding this functionality, but if we are going to do it we might need to think about getting the nomenclature right. 'Organization' is a hugely overloaded term in this space. I think what we are getting at is the official body which issued the NHS number? 'NHS' for England/Wales seems to be a bit broad - surely all of the organisations are NHS bodies, even in NI and Scotland? So we might need to decide what exactly this shortcode would represent.

What do you use the organization code for in your use-case? Is it just informational for the user?

andylaw commented 6 months ago

@andyatterton - apologies for being slow, but I'm not seeing a use case per se in what you've written. Nor am I seeing justification for not using my_number.region.short_label.

In fundamental data modelling terms, each number belongs to a block of numbers which have been allocated to a region. Your label is a function/property of the region. Thus, the label belongs to the region and is shared across all numbers within the block through the region. Modern IDEs autofill all the object properties for you, so the argument that it's too long-winded to type doesn't hold water for me

andyatterton commented 6 months ago

'Organization' is a hugely overloaded term in this space.

I have been turning this over since I commented here, and I think you are spot on. It's one of those internal things I had never really questioned until now. One of the benefits of discussing these things with peers. If we ever figure out the correct terminology here, I may suggest we change things our end.

I think what we are getting at is the official body which issued the NHS number? 'NHS' for England/Wales seems to be a bit broad - surely all of the organisations are NHS bodies, even in NI and Scotland?

I would take the view that we are describing the characteristics of the number itself, not who issued it(despite my opening statement). If a colleague came to me and ask what kind of number a patient identifier was that started with a four and was ten digits long, I would reply it's an NHS number not it's a number issued by the NHS, although I can see both ends of that argument.

Maybe this is something unique to our data, but I would be surprised. We deal primarily with Kidney data, so a large part of what we hold is transplant data. Quite often a patient will cross a boarder to get a transplant or simply move and end up with both a CHI number and an NHS number. Sometimes we might want to search a table containing patient numbers for a specific type of number. Having a number type/tag makes this a bit more convent.

SELECT *
FROM patient_numbers
WHERE number LIKE '4%' OR your_column LIKE '6%' OR your_column LIKE '7%';

vs

SELECT *
FROM patient_numbers
WHERE organisation == 'NHS';

For me the use of these labels to describe the type of number I am dealing with is so common place I hadn't really considered anyone doing it differently.

I don't see this as added functionality either, because I can get this is in a more verbose format from the label of each region. All I'm really suggesting is a shortened version of an existing variable.

These are existing region labels label="Northern Ireland H&C Numbers" label="England Wales and IOM NHS Numbers" label="Scotland CHI numbers"

In fundamental data modelling terms, each number belongs to a block of numbers which have been allocated to a region. Your label is a function/property of the region. Thus, the label belongs to the region and is shared across all numbers within the block through the region. Modern IDEs autofill all the object properties for you, so the argument that it's too long-winded to type doesn't hold water for me

This last part is completely true, and laziness on my part is a poor reason for design choices. I would still say though that if you wanted to describe a type of patient identifier to me, you probably wouldn't say it's a number issued by Scotland you would tell me it's a CHI number, and you would be describing the number itself not the region or range it came from (although I could infer that).

Given the feedback here, I am wondering if this is a habit that is specific to us, in which case it might be more appropriate for us to fork the repo and build in the bits we need.

andylaw commented 6 months ago
SELECT *
FROM patient_numbers
WHERE organisation == 'NHS';

vs

SELECT pn.*
FROM
  patient_numbers pn,
  organisation org
WHERE
  pn.org_id = org.id
AND
  org.name = 'NHS'

Is the difference in the way that we think about the issue, I believe