Closed merelcht closed 2 years ago
The original research done on this topic: https://github.com/kedro-org/kedro/issues/1293
The CLI commands kedro test
and kedro lint
should be deprecated because this is the consensus in #1293.
A summary of the key arguments for deprecation:
kedro-datasets
, the deprecation of kedro install
).A summary of the key arguments against deprecation:
My view is also that the commands should be deprecated. Both commands are wrappers for tools with many alternatives. I believe that if a user knows the value of testing and linting code, they either:
In these cases, the commands offer limited to no utility.
To mitigate the disadvantages discussed in the arguments against deprecation:
lint
, test
and other deprecated commands.In https://github.com/kedro-org/kedro/issues/1293, step three of this plan is somewhat controversial, since it represents an up-front and ongoing investment. One option is to create this plugin as a community-led project. In any case, further discussion is warranted for this step.
This summary makes sense to me, but I'm missing what the concrete the alternatives would be for users currently using kedro test
and kedro lint
and how the users can "migrate" to these alternatives.
There are two concrete alternatives:
Users install and configure the tools that kedro lint
(black, flake8, isort) and kedro test
(pytest) wrap. They then use these tools through their normal interfaces. We will add documentation to Kedro to describe installation and configuration.
For isort and black, users can create and enter settings into a pyproject.toml file. For flake8, they need to do the same for a setup.cfg file. We will guide users to recreating the current default settings shipped with Kedro in our documentation (step 2 of the plan). These linting tools each have a CLI, though they can also be integrated with most common IDEs through easily-installed plugins. Again, this will be added to our documentation.
In the case of pytest, users can switch to using the pytest CLI very easily, as kedro test
is a very lightweight wrapper. This is as simple as running pytest
instead of kedro test
. There are some pytest-specific configuration settings in Kedro's pyproject.toml file. We can guide users in what these options do and how to add them in the documentation that we will write.
The advantage of this alternative is that it captures the major advantages of deprecation outlined in the arguments above. I estimate that implementing these changes would take an experienced Kedro developer one week.
In summary, users can migrate by replicating the configuration for each tool and using their interfaces independently, which we will guide them in doing in newly written documentation.
Benefit:
Cost:
If we decide to implement step 3 of the plan, users will be able to install the plugin, which bundles both kedro test
and kedro lint
, potentially among other deprecated commands.
In this case, the user will not need to undergo any migration beyond installing the plugin; `kedro lint' and 'kedro test' will be available as before.
Benefit:
Cost:
My view is also that the commands should be deprecated. Both commands are wrappers for tools with many alternatives. I believe that if a user knows the value of testing and linting code, they either:
- Know how to set up and use the wrapped tools autonomously.
- Have a preference for alternative tools that are not wrapped by these commands.
In these cases, the commands offer limited to no utility.
This is a fair view. However, why should we even "add demos for implementing testing and linting in a Kedro project to the documentation" in that case? I'd rather a user who "knows the value of testing and linting code" choose their own tools and read the respective docs, so there's nothing for us to maintain here. Otherwise, Kedro maintainers will spend time resolving issues setting up pytest
and flake8
when we in fact have no strong view there. IIRC @AntonyMilneQB mentioned something in this vein (I could be wrong). I think it could be handled with a docs page stating the linter config Kedro uses (i.e. project template is linted with isort, black, flake8).
On that note, I think the plan is missing the bit on how to make sure the project template is still linted (somehow).
This write-up and evaluation is stellar 🎉 An additional benefit is a simpler starter for new users.
If I quickly scan the documentation and starters it would mean that we would be making the following changes (if I haven't forgotten anything):
setup.cfg
pyproject.toml
src/requirements.txt
setup.cfg
is mentioned like in the Project directory structure, the section on Astronomer, and the one on StartersI like the documentation addition because in future, the section on testing could be expanded to cover #1271. I think @deepyaman's comment is the only thing left to cover, which is how do we make sure that we deliver linted and tested starters to our users without those configuration files being present in the starters.
While I was walking, I realised there are more changes 😂 What should happen to src/tests
and what should happen to the tests folder created by kedro pipeline create
? And we'd also need to make changes to the Kedro website.
Both commands are wrappers for tools with many alternatives. I believe that if a user knows the value of testing and linting code, they either... In these cases, the commands offer limited to no utility.
Slightly playing devil's advocate, but something I think we're missing here is what if a user doesn't know the value of testing and linting? Having these commands shipped with kedro does increase discoverability of these notions. If I'm completely new to writing data science pipelines I might not even know about testing and linting, let alone be equipped to choose which tools I should choose or how to set them up. Are we setting users on the right path if we provide them with a project template that doesn't even have a tests
folder to suggest that they should write tests?
IMO the value of these tools being built into kedro is most significant to inexperienced users, and so any documentation that effectively replaces them should also cater to that user group.
Just to clarify on alternative 2 (Users Install our 'Dev Plugin'), which I personally still like: this can be done with essentially zero development effort and no ongoing maintenance if we outsource it to the community and make it a community-maintained plugin. And then our docs can just say "linting code is good. If you want a standard set of tools that help with this then we recommend this plugin". See https://github.com/kedro-org/kedro/issues/1622. If there's no demand for the plugin then it won't be created, which is fine. I don't think we need to do anything about it ourselves right away.
That would still leave open the question of what standards we should be linting our own starter templates to and how to do it though.
Can't we just keep the linting and testing setup in the starters as is now and use that as an example for users to show how they can configure their project to have linting and testing? I realise that with this approach we don't clean up as much, but I feel this would solve the problem of how we keep the starters in good shape and have concrete examples for users how to set up linting and testing. The only change is that we don't lint and test by doing kedro lint
and kedro test
but by running the underlying commands.
In technical design (14/09/22) it was concluded that both commands should be deprecated. The following action items were agreed:
1 (#1619 and #1620). Short guides for linting and testing should be written and added to the docs. Depending on the traffic metrics for this part of the docs, a next step of creating either a lint / test starter* (see below) or a plugin will be considered. 2 (#1849). Explore whether the /tests folder and / or the linting configuration (requirements, config files) should be kept or dropped in the starters and / or template.
* @yetudada proposed moving the elements needed to add both commands to a starter project. This may provide significant advantages in ease of maintenance relative to the plugin option.
Investigate what the alternatives would be for users currently using
kedro test
andkedro lint
. Write up a plan to deprecatingkedro lint
andkedro test
(including how the users can "migrate" to these alternatives) or alternatively summarise why those commands shouldn't be deprecated.Also keep in mind that the alternatives should still make it possible to have a linted/testable Kedro template.