kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.89k stars 897 forks source link

Change pipeline test location to project root/tests #3731

Closed lrcouto closed 6 months ago

lrcouto commented 6 months ago

Description

Development notes

Changes the location in which tests for pipelines are created with the command kedro pipeline create, from <project root>/src/tests/pipelines/<pipeline name> to <project root>/tests/pipelines/<pipeline name>. This change was made to match the template used by the starters, avoiding creating two separate directories for tests.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

noklam commented 6 months ago
  • where 1 = <Result ValueError("'/tmp/tmpq_u_2key/dummy_project/tests/pipelines/my_pipeline' is not in the subpath of '/tmp/tmpq_u_2key/dummy_project/src' OR one path is relative and the other is absolute.")>.exit_code

I saw this everywhere which is related to the ticket that I am working on. See:

However, I don't think this is the problem, but you can try to cherrypick my commits at https://github.com/kedro-org/kedro/pull/3742 to see if it solves the problem. I suspect is the test fixtures need some update but I only look at it for 2 mins so I don't know where it should be.

lrcouto commented 6 months ago
  • where 1 = <Result ValueError("'/tmp/tmpq_u_2key/dummy_project/tests/pipelines/my_pipeline' is not in the subpath of '/tmp/tmpq_u_2key/dummy_project/src' OR one path is relative and the other is absolute.")>.exit_code

I saw this everywhere which is related to the ticket that I am working on. See:

* [Symlink resolution during configuration detection might break some CLI commands #2346](https://github.com/kedro-org/kedro/issues/2346)

However, I don't think this is the problem, but you can try to cherrypick my commits at #3742 to see if it solves the problem. I suspect is the test fixtures need some update but I only look at it for 2 mins so I don't know where it should be.

I think I found the source of the issue, the function that handles kedro micropkg package was still referring to the old path. I think most of the tests should pass now.

ankatiyar commented 6 months ago

Tested this and it works well, but I think there is still a little inconsistency between the starters and the kedro pipeline create <pipeline_name> command - Screenshot 2024-04-02 at 12 41 06

The test for data_science pipeline is under <root>/tests/pipelines/test_data_science.py but kedro pipeline create creates a folder inside <root>/tests/pipelines/<pipeline_name>/test_pipeline.py. Should we change the starters to match this format or maintain the flat structure where <root>/tests/pipelines/test_<pipeline_name>.py is created (and an __init__.py file if it doesn't exist)?

lrcouto commented 6 months ago

Tested this and it works well, but I think there is still a little inconsistency between the starters and the kedro pipeline create <pipeline_name> command - Screenshot 2024-04-02 at 12 41 06

The test for data_science pipeline is under <root>/tests/pipelines/test_data_science.py but kedro pipeline create creates a folder inside <root>/tests/pipelines/<pipeline_name>/test_pipeline.py. Should we change the starters to match this format or maintain the flat structure where <root>/tests/pipelines/test_<pipeline_name>.py is created (and an __init__.py file if it doesn't exist)?

I think changing the starters to match the tests/ structures would be the best option. Having a separate directory for each pipeline would be more organized, allow tests for a specific pipeline to be found more easily, and allow for files containing utility functions or fixtures related do a specific pipeline's tests to be stored together with the test files.

astrojuanlu commented 6 months ago

+1 to updating the starters according to what @ankatiyar and @lrcouto have said. Would that be a blocker for this particular PR? Or can we proceed and review & merge it?

lrcouto commented 6 months ago

+1 to updating the starters according to what @ankatiyar and @lrcouto have said. Would that be a blocker for this particular PR? Or can we proceed and review & merge it?

IMO we can merge this one and make the starter changes on another PR specific for them.