Closed pacharanero closed 1 year ago
So my suggestion is that any generative function should accept some flags as parameters. These might include:
only_chi_numbers
only_nhs_numbers
only_hsc_numbers
only_development
only_invalid
It would also allow generation of a batch of unique numbers.
Disclaimers could be added to the docstrings in the function of the type already mentioned.
@eatyourpeas - see Issues #5 and #6. I think the best way to address those (and this) is to have a set of "constants" defining regions/countries (we can alias them to cover any political sensibilities) and a separate set of constants to define (in)validity states.
i.e.
REGION_ENGLAND.= 'england'
REGION_WALES = 'wales'
REGION_SCOTLAND = 'scotland'
REGION_NORTHERN_IRELAND = 'northern ireland'
and
VALID = 'valid'
INVALID = 'invalid'
INVALID_CHECKSUM = 'invalid checksum'
INVALID_DATE_OF_BIRTH = 'invalid date of birth'
INVALID_RANGE = 'invalid range'
We could do this using ENUMs. That would pin us to Python 3.4 and above.
We'll need to decide if we provide signature(s) that allow numbers for multiple regions to be generated. My instinct is to make it a single region per function call.
I agree @andylaw - I have actually been working on something similar for the regions - a constant REGIONS which contains the start and end number ranges as well as a plain text representation of the Region Name, and then a tag section which can be used to identify the region using several equally valid aliases, as you say to help ameliorate any political sensitivities.
Something like:
REGIONS = [
{ "start": 10_000_000, "end": 311_299_999, "label": "Scotland CHI numbers", "tags": ["scotland", "chi"] },
... ,
... ,
]
This constant can then be imported wherever needed, and would serve as the 'source of truth', and in the documentation we can provide references for the facts asserted.
I have also got some clarity from my Irish contacts on the ROI situation - I will link to it in another Issue so we can put all Eire/Ireland related info in that thread and link back here.
"start"
should be 01_000_000
for CHI numbers unless you want to disenfranchise everyone born before the 10th of any given month :)
I would also suggest a design that actually names the regions individually. All those strings are 'magic' as you've coded it
"start"
should be01_000_000
for CHI numbers unless you want to disenfranchise everyone born before the 10th of any given month :)
From our README information the number range for CHI is 100000000n-319999999n, is this incorrect? The underscores as 000's separators works in Python for integers but doesn't seem to like having leading zeros, at least, the syntax highlighting/linter doesn't like it.
We could simply use strings everywhere, that might be simpler.
I would also suggest a design that actually names the regions individually. All those strings are 'magic' as you've coded it
I'm not sure what you mean by 'magic'.
Do you mean having a key:value of the range name:
REGIONS = [
"scotland": { "start": 10_000_000, "end": 311_299_999, "label": "Scotland CHI numbers", "tags": ["scotland", "chi"] },
"wales": { ... } , etc
... ,
]
From our README information the number range for CHI is 100000000n-319999999n, is this incorrect?
It is incorrect. The first six digits of a CHI number are the holder's date of birth in DDMMYY format (see https://www.datadictionary.nhs.uk/attributes/community_health_index_number.html). Thus, the first digit will be a zero in roughly 30% of cases.
The lowest number that's valid for a CHI number (excluding checksum) is therefore 010100000 (female born on 1st January 2000 - more likely than 1900)
The underscores as 000's separators works in Python for integers but doesn't seem to like having leading zeros, at least, the syntax highlighting/linter doesn't like it.
We could simply use strings everywhere, that might be simpler.
Numbers is fine, but they can and do start with a zero
I would also suggest a design that actually names the regions individually. All those strings are 'magic' as you've coded it
I'm not sure what you mean by 'magic'.
Do you mean having a key:value of the range name:
REGIONS = [ "scotland": { "start": 10_000_000, "end": 311_299_999, "label": "Scotland CHI numbers", "tags": ["scotland", "chi"] }, "wales": { ... } , etc ... , ]
No. I mean...
REGION_SCOTLAND = {
"start": 1_000_000,
"end": 311_299_999
"label": "Scotland CHI numbers"
}
REGION_ENGLAND = {
...
}
REGIONS = [
REGION_SCOTLAND,
REGION_ENGLAND,
...,
]
...and then I can write code like...
for region in REGIONS:
...
...and
valid = nhs_number.is_valid('1234567890', for_region=REGION_SCOTLAND)
Bonus points awarded for coding the Region as a Class and creating one named instance per region at the top of the code which lets syntax checkers look inside and make sure that we don't hit typos for start
and end
when we're using them
It is suggested we add a feature to generate valid and invalid NHS numbers, for testing.
Generation of valid NHS Numbers that may be “live”.
I agree and I think we should make it clear in doco that firstly all generated numbers COULD be a real number and must never be looked up in a live system, and secondly that users should NOT be obtaining any 'live' NHS numbers from our package since they are all assigned by a central assignment body.
Generation of INVALID NHS Numbers to test other systems.
However, if you’ve included our package to check the validity and then ask us to generate a list of invalid numbers then I would expect your code to spot that they are invalid numbers.