kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.49k stars 874 forks source link

Document best practices for writing tests for nodes and pipelines #3782

Closed AhdraMeraliQB closed 2 months ago

AhdraMeraliQB commented 3 months ago

Description

Addresses #1271

Added the page "Test a Kedro project" to the spaceflights tutorial. The page details the structure of writing a positive and negative unit test for a node function, and an integration test for (part of) a pipeline, with examples. It then discusses some best practices for testing, including the directory setup, naming conventions, and use of fixtures.

The page on automated testing was updated to reflect the test directory structure we use in our template projects, and to include instructions on locally installing the Kedro project before testing, resolving the issues seen around imports from the project into the tests - xref on Slack

These changes will need to be updated on starters (PR paused - pending docs approval) - https://github.com/kedro-org/kedro-starters/pull/216

Development notes

See the built docs here

The example tests have been tested in a spaceflights project (both unrefined and refactored versions) and pass as expected. These docs have pulled inspiration from some of the internal pages linked on the original issue.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist