kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.91k stars 900 forks source link

Investigate how to improve documentation readability with a language "linter" #2518

Closed stichbury closed 1 year ago

stichbury commented 1 year ago

Mentioned in retro: Vale is a tool that we can configure to pick up issues in our markdown docs such as capitalisation, bullet inconsistency and hard-to-read language. I'm skeptical that it will be useful to apply over all our docs, as the work required to trawl through all the issues and fix may be high for the benefits that it gives us.

But, I do think it may be useful to run over docs as they change, to pick up issues in files that we are touching, and tidy them up as we go. I'm going to spend a couple of hours looking at it with the thought that, even if this doesn't work for Kedro, it could be helpful for other teams that don't have such dedication to docs (and/or a technical writer).

astrojuanlu commented 1 year ago

Yes! More info: https://passo.uno/posts/first-steps-with-the-vale-prose-linter/

And I agree, let's not run it through all of our docs. But we could set a pre-commit hook so that it only runs on files that are changed: https://github.com/errata-ai/vale/blob/v2/.pre-commit-hooks.yaml

astrojuanlu commented 1 year ago

https://github.com/Datadog/datadog-vale

ankatiyar commented 1 year ago

Can set this up as a github actions workflow as well - https://github.com/marketplace/actions/vale-linter I think there is a way to set it up with reviewdog in a way where it will run the workflow and leave suggestions as PR review comments which I think is quite neat, you could include those suggestions or ignore them then

stichbury commented 1 year ago

This looks great. How much time do you think it would take -- shall we make a ticket for next sprint or is it a short piece of work to move forward as part of this ticket? What is the next step; for me to decide on the rules to apply?

ankatiyar commented 1 year ago

@stichbury I think it's not too much effort to set up the GA, I have a PR for this already - https://github.com/kedro-org/kedro/pull/2953. Once this PR is merged (or even as a part of the same PR), you could add the necessary rules.