opensafely-core / ehrql

ehrQL: the electronic health record query language for OpenSAFELY
https://docs.opensafely.org/ehrql/
Other
7 stars 3 forks source link

`assure` command fails with "AttributeError: 'list' object has no attribute 'items'" #2079

Open Jongmassey opened 4 months ago

Jongmassey commented 4 months ago

@milanwiedemann reported a problem with assure in this Slack thread

I can replicate this problem. Using the example dataset_definition.py and test_dataset_definition.py from How to test your dataset definition in the ehrql docs, I get the following error output from the assure command


$ opensafely exec ehrql:v1 assure analysis/test_dataset_definition.py
Traceback (most recent call last):
  File "/opt/venv/bin/ehrql", line 8, in <module>
    sys.exit(entrypoint())
             ^^^^^^^^^^^^
  File "/app/ehrql/__main__.py", line 75, in entrypoint
    return main(sys.argv[1:], environ=os.environ)  # pragma: no cover
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ehrql/__main__.py", line 102, in main
    function(**kwargs)
  File "/app/ehrql/main.py", line 343, in assure
    results = assurance.validate(variable_definitions, test_data)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ehrql/assurance.py", line 36, in validate
    constraints_error = validate_constraints(records, table)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ehrql/assurance.py", line 72, in validate_constraints
    for column, value in record.items():
                         ^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'items'

when starting from a fresh repo based on the research-template.

evansd commented 4 months ago

Following further discussion in another thread the issue appears to be an invalid example rather than a bug in ehrQL.

In the example test script, the patients data for the first patient is defined correctly as a dict:

test_data = {
    # Expected in population with matching medication
    1: {
        "patients": {"date_of_birth": date(1950, 1, 1)},
        "medications": [
            {
                # First matching medication
                "date": date(2010, 1, 1),
                "dmd_code": "39113311000001107",
            },
...

But the second and third patients are defined as a list of dicts:

2: {
        "patients": [{"date_of_birth": date(1950, 1, 1)}],
        "medications": [],
   ...

Which is causing the error, because the script expects a one-row-per-patient table to be defined as a dict.

We should therefore: