open-sdg / sdg-build

Python package to convert SDG-related data and metadata between formats
MIT License
5 stars 23 forks source link

Test coverage audit/revamp #181

Closed brockfanning closed 2 years ago

brockfanning commented 3 years ago

A lot of the functionality in this library is not covered by tests. That's one problem; the other problem is that it's not easy for developers (or at least this developer) to add tests. That may be fixable by becoming more familiar with Python test suites in general, but also there may be a way to rewrite the test code to make this easier. So I think we should pick an approach and move forward, either:

  1. Rewrite the existing test suite to make it more understandable, and then add full coverage, OR
  2. Put in the work to understand the existing test suite, and then add full coverage.
jwestw commented 3 years ago

Would it be good to first understand what these tests are doing? As the code doesn't look crazy hard, I think it might be good to unpick it and document it then decide whether to revamp (partly by adding more documentation) or replace.

brockfanning commented 3 years ago

@jwestw The things that the tests already cover are:

  1. check_all_csv() function: This is a deprecated function that is actually using open_sdg_check behind the scenes. So this test really covers open_sdg_check(). We'll remove this test at version 2.0.0, when we can drop deprecated stuff, though we'll need to add a test that directly covers open_sdg_check().
  2. check_all_meta() function: This is another deprecated function also using open_sdg_check(). We'll remove this test at version 2.0.0.
  3. "edge" detection: This "edge" stuff is still being used and needs to continue to be tested. This is how Open SDG is able to automatically calculate the parent/child relationships of the disaggregations.
  4. path functions: This covers a couple of path-related helper functions that are still being used, and so need to continue to be tested.
  5. builds: This is the big one where the most important tests are, covering the input/output functionality. It's not covering everything though. Right now it covers:
    • The typical Open SDG input:
      • CSV data
      • YAML/Markdown metadata
      • SDG Translations
      • _prose.yml schema
      • The typical Open SDG output:
      • JSON data and metadata
      • JSON reporting status
      • JSON translations
      • JSON schema
      • GeoJSON
    • Documentation pages, like the disaggregation report

There are several notable things that we are not testing at all:

  1. SDMX input of any kind
  2. Excel, CSV, or YAML-only metadata input
  3. Disaggregation status
  4. Zip file exports

In addition, some of the existing tests may be insufficient. For example, the test of the disaggregation report only checks to make sure certain pages exists at all. But it does not test specific things on the pages (like confirming the number of disaggregations listed, etc.)

My main concern -- and the reason for my posting this issue -- is that it's not obvious how to add new tests for these missing things. That may be due to my inexperience with Python testing, but that is the point: I would like to structure the tests so that it's simple for a relative beginner to add a new one, so that we don't end up with so many untested features, like we have now.

jwestw commented 3 years ago

Thank you Brock

brockfanning commented 2 years ago

@LucyGwilliamAdmin @otis-bath This might be something to pursue - we still have a deficit of test coverage on sdg-build.