opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
7 stars 3 forks source link

Inconsistent capitalisation of "the" causing problems with example data #2106

Closed inglesp closed 1 week ago

inglesp commented 1 month ago

The example data reference "Yorkshire and the Humber", while our table validation checks for "Yorkshire and The Humber".

This causes problems in the sandbox:

(.venv) inglesp@malbogies:~/work/ebmdatalab/ehrql$ python -m ehrql sandbox ehrql/example-data
Python 3.11.10 (main, Sep  7 2024, 18:35:41) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from ehrql.tables.tpp import practice_registrations
>>> practice_registrations
Traceback (most recent call last):
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 145, in parser
    return convertor(value)
           ^^^^^^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 168, in wrapper
    raise ValueError(f"{value!r} not in valid categories: {category_str}")
ValueError: 'Yorkshire and the Humber' not in valid categories: 'North East', 'North West', 'Yorkshire and The Humber', 'East Midlands', 'West Midlands', 'East', 'London', 'South East', 'South West'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 83, in __iter__
    yield row_parser(row)
          ^^^^^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 110, in row_parser
    return tuple(parser(row) for parser in parsers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 110, in <genexpr>
    return tuple(parser(row) for parser in parsers)
                 ^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 147, in parser
    raise ValueError(f"column {name!r}: {e}")
ValueError: column 'practice_nuts1_region_name': 'Yorkshire and the Humber' not in valid categories: 'North East', 'North West', 'Yorkshire and The Humber', 'East Midlands', 'West Midlands', 'East', 'London', 'South East', 'South West'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
ehrql.file_formats.base.FileValidationError: row 8: column 'practice_nuts1_region_name': 'Yorkshire and the Humber' not in valid categories: 'North East', 'North West', 'Yorkshire and The Humber', 'East Midlands', 'West Midlands', 'East', 'London', 'South East', 'South West'
inglesp commented 1 month ago

We should:

evansd commented 1 month ago

This feels fairly high priority, given that we encourage use of the sandbox for new learners.