opensafely-actions / cohort-report

Cohort-report generates a report for each variable in an input file
MIT License
0 stars 0 forks source link

Unexpected 'AssertionError: Columns do not match config' error #44

Open LisaHopcroft opened 3 years ago

LisaHopcroft commented 3 years ago

Attempting to implement a very simple cohortreport with two variables - age and sex. The cohortreport action is provided as:

generate_report:
    run: cohort-report:v2.0.2 output/input_2019-09-01.csv
    needs: [generate_study_population]
    config:
      variable_types:
          age: int
          sex: categorical
      output_path: output/cohort_reports_outputs
    outputs:
      moderately_sensitive:
        reports: output/cohort_reports_outputs/descriptives_input.html

In the input file, age is provided as an integer and sex is provided as the characters M or F.

Contents of the log file for this action are:

2021-10-27T13:24:27.499325100Z Traceback (most recent call last):
2021-10-27T13:24:27.499387300Z   File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
2021-10-27T13:24:27.499406700Z     return _run_code(code, main_globals, None,
2021-10-27T13:24:27.499427800Z   File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
2021-10-27T13:24:27.499449000Z     exec(code, run_globals)
2021-10-27T13:24:27.499464000Z   File "/workspace/cohortreport/__main__.py", line 94, in <module>
2021-10-27T13:24:27.499485500Z     main()
2021-10-27T13:24:27.499506000Z   File "/workspace/cohortreport/__main__.py", line 90, in main
2021-10-27T13:24:27.499527100Z     run_action(input_files=args.input_files, config=processed_config)
2021-10-27T13:24:27.499548200Z   File "/workspace/cohortreport/__main__.py", line 48, in run_action
2021-10-27T13:24:27.499569300Z     make_report(
2021-10-27T13:24:27.499589800Z   File "/workspace/cohortreport/report.py", line 69, in make_report
2021-10-27T13:24:27.499610900Z     df = type_variables_in_df(df=df, variables=variable_types)
2021-10-27T13:24:27.499631800Z   File "/workspace/cohortreport/processing.py", line 115, in type_variables_in_df
2021-10-27T13:24:27.499647100Z     checked_df = check_columns_match(df=df, variables=variables)
2021-10-27T13:24:27.499668300Z   File "/workspace/cohortreport/processing.py", line 96, in check_columns_match
2021-10-27T13:24:27.499689500Z     raise AssertionError("Columns do not match config")
2021-10-27T13:24:27.499710600Z AssertionError: Columns do not match config

state: failed
docker_image_id: sha256:cbb7931dbd97d030a3111381e0c8316a9733f28633c975d633cd9620869f1ac6
action_repo_url: https://github.com/opensafely-actions/cohort-report
action_commit: d9237b52874cad29e8dc6884c1e7b1718cec188b
job_id: jfrtekce5gsr7mah
run_by_user: lisahopcroft
created_at: 2021-10-27T13:23:55Z
completed_at: 2021-10-27T13:24:29Z
exit_code: 1

Job exited with an error code

The code is available here and the specific commit is here.

iaindillingham commented 3 years ago

Thank you for creating this issue, @LisaHopcroft. At present, every variable in the input CSV file needs a corresponding entry in variable_types. The study definition at 16dd9fa included more variables than age and sex, so cohort-report raised an error. This behaviour isn't clear from the documentation, which states:

The generate_report action outputs one HTML document with a graph for each variable specified.

This suggests that variables that aren't specified won't have a graph in the HTML document. I think this is the same issue as #23, albeit that issue is about documentation and a bug; and this issue is about documentation.