As a contributor to J40, I want the code in the comparison tool notebooks to be covered by tests, so that I know my contributions won't break existing functionality.

Description

Currently the majority of the code in the score/ directory is stored in iPython notebooks and is not covered by unit or integration tests. This introduces the following challenges when trying to contribute to this section of the code base and close out the issues related to the comparison tool:

Applying multiple, complex transformations to data within an ETL pipeline can accidentally introduce errors that are difficult to detect, especially when data sets are large, outputs are tested manually.
Additionally, it can be difficult to meaningfully refactor untested code, since we can only ensure its accuracy by comparing the output of the original code against the output of the newly refactored code.
These challenges are compounded by the fact that iPython notebooks are notoriously difficult to version control, and that contributions from multiple people to the same file often result in merge conflicts.

Solution

In order to facilitate simpler and more reliable collaboration on the code in the score/ directory, we should begin migrating the code in the iPython notebooks to standalone scripts that can be covered with both unit and integration testing. The code in score/ should be prioritized for migration and test coverage in the following phases:

Phase 1: Complete before the related issues listed at the bottom
- ipython/score_calc.ipynb
- ipython/scoring_comparison.ipynb
Phase 2: Complete in parallel with the related issues listed at the bottom
- ipython/census_etl.ipynb
- ipython/ejscreen_etl.ipynb
- ipython/housing_and_transportation_etl.ipynb
- ipython/hud_housing_etl.ipynb
Phase 3: Complete after the related issues listed at the bottom
- utils.py
- etl/sources/census/etl.py
- etl/sources/census/etl_utils.py

Describe alternatives you've considered

Alternatives to migrating the code and setting up automated unit and integration testing include, and why these alternatives aren't viable:

Limit one contributor per notebook
- Significantly slows the rate of collaboration and progress on this tool
- Doesn't address errors introduced when data is transferred between stages in separate notebooks
- Requires every contributor to know learn how to manually check the code in their notebook
Establish manual testing checklist
- Nearly as time intensive as setting up automated testing, but with fewer reliability guarantees
- Still prone to error and can't be enforced through CI/CD checks

Links to user research or other resources

Challenges with and strategies for managing jupyter notebooks in git
Primer on Test Driven Development - For potential contributors who are new to unit and integration testing
Pytest - Powerful yet simple testing framework for Python
Tox - Tool often used alongside pytest to automate and manage testing environments
GitHub Action for tox - Example GitHub action workflow for executing Python tests using tox

Tasks

[ ] Migrate the code from the notebooks in the score/ipython/ directory into separate modules, preferably organizing the code into discrete functions that correspond to the steps of the ETL and scoring process. Prioritize the following files:
- [ ] ipython/score_calc.ipynb
- [ ] ipython/scoring_comparison.ipynb
[ ] Create corresponding tests for each of the functions in the newly created scripts using pytest
[ ] Update the score/README.md with instructions for how to create the appropriate tests when contributing new code
[ ] Bonus
- [ ] Use tox to run the tests and pytest-coverage to assess the test coverage, and set tox to fail if test coverage is under a certain threshold
- [ ] Set up a GitHub action workflow to execute the tests on push and pull request, to ensure the reliability and test coverage of future contributions

Definition of "Done"

[ ] Tests have been written for code in the following notebooks (at a minimum):
- [ ] ipython/score_calc.ipynb
- [ ] ipython/scoring_comparison.ipynb
[ ] When a new contributor checks out the code, following the installation guidelines under score/README.md and then runs poetry run pytest (or poetry run tox if bonus steps are followed) all of those tests pass
[ ] If a contributor makes a breaking change to any of the code in the scripts that are covered by tests, those tests would fail, signaling to the contributor that they've introduced a bug
[ ] When a contributor wants to contribute a new feature to or fix a bug within the score code base, they can follow a set of guidelines on the score/README.md describing how to create a new test to ensure the quality and accuracy of their contribution or bug fix
[ ] Bonus: When a contributor pushes new code or creates a PR, these tests are executed automatically by GitHub actions and will fail if:
- [ ] The code being pushed breaks any existing parts of the workflow
- [ ] The code they're trying to push increases the portion of untested code above a certain threshold

usds / justice40-tool