Closed yetudada closed 3 months ago
I have moved this to the Kedro wiki, because this is not an issue we would take on as Sprint work as is: https://github.com/kedro-org/kedro/wiki/Insights-and-opportunities-related-to-helping-Kedro-impact-more-users
Introduction
What is this?
We have conducted extensive research to understand people's motivations for using or not using Kedro. We want to improve Kedro to provide more value to data scientists, data engineers, machine learning engineers, and other users. We've compiled the research insights and potential improvement ideas in this GitHub issue so that we can prioritise concepts that make Kedro an attractive option for users across roles and skill levels.
In part, this issue addresses: https://github.com/kedro-org/kedro-viz/issues/1448
What's in the scope of this work?
Our next step is to conduct value testing to identify the most impactful concepts. This user-centred process should guide us toward three high-potential solutions that can meaningfully solve pain points based on evidence directly from Kedro users.
What terminology will I be using?
To focus our research, we will look at two representative user profiles that encompass vital segments of the data science community:
It's helpful to define two existing ways of using Kedro, to ensure we have a shared understanding when discussing Kedro's architecture:
What are some of our learnings?
IDE-focussed users want to adopt Kedro in an existing use case
IDE-focussed users will try to learn how to use Kedro by refactoring an existing use case into a project that uses Kedro. Their objective is to learn how to leverage Kedro as a framework or to adopt Kedro in stages by incorporating library components into their work.
IDE-focussed users want to incorporate Kedro in an existing project template
IDE-focussed users leverage internal project templates provided by CookieCutter or tools that provide project templates like Poetry, Hydra and DVC. This user group might bypass Kedro because of the high switching cost when adopting Kedro's project template or the challenges with integrating Kedro with those tools. We recommend that our users start from a Kedro project template or starter, and this may not be possible.
IDE-focussed users want to choose the features included in the project templated generated by Kedro
IDE-focussed users have a lot of opinions about how they want their project template to be structured. There was a lot of variance on #208. Suppose an IDE-focussed user has committed to the project template created by Kedro. In that case, they still want the flexibility to choose which features are enabled and visible in their personalised template.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it
Kedro is positioned as an all-or-nothing overhaul. Our users will choose not to use Kedro when placed on a collaborative project and are the only ones that want to use Kedro. Most of these perspectives are associated with adopting the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Our project template has a lot of software engineering concepts embedded in it, some more necessary than others. It is reasonable to expect that a notebook-focussed user, unfamiliar with this paradigm from software engineering frameworks, would need help understanding what each directory and file does - either by using our documentation or speaking to an expert user of Kedro. This user group also needed help understanding the role of configuration, and some preferred writing their code in a single file, a notebook.
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework
Users assume an all-or-nothing use of the Kedro framework and do not realise they can use the Data Catalog as a stand-alone item. Our documentation for Kedro as a data registry is a very unpopular page, but we also do not talk about this functionality at all with our users.
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework
Kedro's modular architecture provides opportunities to delight users by incrementally integrating specific components like the Data Catalog. For example, IDE-focused users have used the Catalog to empower analysts on their teams. Additionally, users who found the framework restrictive or just wanted to use Kedro for data exploration have benefited from the Data Catalog.
IDE-focussed users workaround our ConfigLoader's assumptions
IDE-focussed users run into errors because our ConfigLoader requires a
conf
directory, makes users place their configuration inconf/base
and needsconf/local
to be present. We expected our users to make ConfigLoaders without these assumptions, but we have yet to see evidence that they have done this. Our users choose to use other tools instead of our ConfigLoader or have workarounds for the errors that we create. We've assumed that users would always start from a Kedro project, and that's not always true.How are we trying to address these insights?
We have compiled a table of adoption opportunities, consolidated past and future concepts and solutions, and new ideas to build on learnings. This table catalogues challenges identified through user research and outlines the rationale behind solutions we have prototyped or proposed to address each obstacle.
kedro init
(#2512)kedro init
assumes that IDE-focussed users want to adopt the framework. It only adds files to an existing project so thatkedro
recognises it as a project, these files are detailed in our architecture overview. Users will still need to make significant changes to their code, e.g. create pure Python functions, create a Python package forsrc
, take out hard-coded configuration values, remove I/O, figure out how to integrate tools e.g. MLflow or DVC, and more.conf
structure from the project template. Users would not be able to use the CLI or Kedro-Viz.kedro init
(#2512)kedro init
allows users to add the minimum files required for us to recognise that it's a Kedro project. This design will not address integration between the tools, e.g. look at Databricks' MLOPs stack and try to add files for Kedro to this. Nor will it provide flexibility for customising the project template created by Kedro (#2553).conf
structure from the project template. Users would not be able to use the CLI or Kedro-Viz.nodes.py
in the project template - and that you're not intimidated by the framework.kedro jupyter convert
) into a Kedro project was part of this concept; it's essentially a "framework in the notebook". Users did not usekedro jupyter convert
. This plugin idea was one of the worst-rated ideas in our Kedro IDE exploratory concept tests because users wondered how to revert the code from the framework into the notebook and whether it would always work.conf
directory and a notebook. It also thought about a journey into the full Kedro project template. We don't have evidence to suggest this feature is well adopted; users solely leverage the Data Catalog (and even use alternative libraries for loading configuration), @Galileo-Galilel created a custom starter for his teams, and it does expect that users should have a partial understanding of the project template (conf
).conf
base
andlocal
confused some users, but we never completed the rename because the results were inconclusive. @idanov considered allowing users to choose their own names (#770).AbstractDataset
tokedro-datasets
(#2409)AbstractDataset
to exist inkedro-datasets
. They don't want to importkedro
or install dependencies related tokedro
when leveraging this functionality (#1758).