sellorm / nhsnumber

An R package to work with NHS numbers and their checksums
https://nhsnumber.sellorm.com/
Other
10 stars 0 forks source link

'10x9s' NHS number is not valid #5

Closed t-pollington closed 2 months ago

t-pollington commented 2 months ago

Hi,

Thanks for making the package.

1) nhsnumber::is_valid(9999999999) gives TRUE however this is the missing number value that is commonly inputted by users.

2) epidm::valid_nhs(9999999999) gives a 0 which is correct.

3) It may be worth doing an exhaustive search of all 10-digit numbers, to see where you differ from epidm, as a cross-check.

Kind regards, Tim.

sellorm commented 2 months ago

Thanks for raising this Tim.

I'm not sure we need to change the current behaviour though.

epidm::valid_nhs() contains a check for any sequence of 10 recurring digits (1111111111, 2222222222, etc.) and reports any found as invalid.

nhsnumber::is_valid() does not contain such a check since those numbers are, algorithmically speaking at least, valid.

> epidm::valid_nhs(c(1111111111, 2222222222, 3333333333, 4444444444, 5555555555, 6666666666, 7777777777, 8888888888, 9999999999))
[1] 0 0 0 0 0 0 0 0 0
> nhsnumber::is_valid(c(1111111111, 2222222222, 3333333333, 4444444444, 5555555555, 6666666666, 7777777777, 8888888888, 9999999999))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

I understand that these numbers would never be issued, but they are at least valid in the algorithmic sense. To be honest, I'd always assumed that people used 9999999999 in place of actual actual number because it is a valid number (again, algorithmically speaking).

since nhsnumber only asserts the algorithmic validity of the number and not the actual validity -- whether it was issued or not, for example -- I think I'd prefer to let the current behaviour stand rather than introduce a change that potentially breaks people's existing workflows.

It's worth noting that, according to this NHS digital document on synthetic data, all numbers starting 999 are reserved for testing. Given this information, I feel moving away from checking the algorithmic validly of "NHS numbers" is certain to cause more problems than it solves.

t-pollington commented 2 months ago

Thanks for explaining that @sellorm . Yes I can see it's a subtle beast where a fix isn't necessarily the way. From my side I want to try and get my colleagues to use the term 'checksum valid' or 'issuer validated/trace-validated' when discussing the validation status of an NHS number to reduce the ambiguity.