kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.91k stars 900 forks source link

Design full Project Creation CLI Flow #2506

Closed amandakys closed 1 year ago

amandakys commented 1 year ago

Parent Issue: #2388

Description

Design the full revamped Project Creation CLI flow which includes:

Context

yetudada commented 1 year ago

Another requirement from @NeroOkwa: "Making sure Kedro-Viz is discoverable at CLI creation".

datajoely commented 1 year ago

Some nice libraries for interactive command line interfaces:

yetudada commented 1 year ago

Design considerations

We will work on the assumption that the user will be able to create a project template in the following ways:

These assumptions mean that we should:

What's out of scope for V1?

What does the project creation wizard look like?

The user should:

Note: I've called add-ons, tools. But this can be changed back.

(my-virtual-environment) ➜  kedro new 

Project name
===========
Please enter a human-readable name for your new project.
Spaces, hyphens, and underscores are allowed.
To skip this step in future use the `--name` CLI flag

 [New Kedro project]: Customer segmentation 

Project tools
===========
Here, you can select which tools you'd like to include. 
To read more about these tools and what they do, visit: docs.kedro.org
To skip this step in future use the `--tools` CLI flag

Tools 
1)  Linting  :  Adds linting with Ruff and Black
2)  Testing  :  Adds testing support with Pytest
3)  Logging  :  Adds more logging options
4)  Documentation  :   Adds documentation support with Sphinx
5)  Data structure  :  Creates a directory structure for storing data 
6)  Kedro-Viz  :  Adds setup for Kedro's native visualisation tool
7)  PySpark  :  Adds support for PySpark

Which add-ons would you like to include in your project? [None/1-4/all/1,3]: 1,2,7

Example code
============
Select whether you would like an example spaceflights pipeline included in your project.
To read more about this example, visit: docs.kedro.org
To skip this step in future use the `--example` CLI flag

Would you like to include an example [y/N]: y

What do the project creation CLI flags look like?

We need to create new ones for the project name (--name) and tools (--tools) and including example code (--example_code).

So it could be something like:

kedro new --name="Customer segmentation" --tools=lint,test,pyspark --example_code=y

Note: We need shorthand names for --tools, i.e. linting is lint. We also have yet to decide the default behaviour if a user does not indicate one or more flags. We should assume people would use this flow in the CLI, so putting them through the CLI wizard to get the response should not happen.

Questions for our users

What happens if you don't provide a CLI flag, for:

For example: If you ran kedro new --name="Customer segmentation" what do you expect would happen?

Additional questions for us

How do you enter default when there are multiple options, e.g. the add-ons workflow?

amandakys commented 1 year ago

Following the internal user feedback session on 13/09/2023:

We presented the following prototype

(my-virtual-environment) ➜  kedro new 

Project Name
============
Please enter a human readable name for your new project.
Spaces, hyphens, and underscores are allowed. 
To skip this step in future use --name

[New Kedro Project]: My ML Pipeline 

Project Add-Ons 
================
Select which add-ons you'd like to include. 
To skip this step in future use --add-ons
To read more about these add-ons and what they do visit: kedro.org/

Add-Ons 
1) Lint:        Provides a basic linting set up with Black, Ruff 
2) Test:        Provides basic testing set up with pytest 
3) Log:         Provides more logging options, environment specific,  
4) Docs:        Provides basic documentations setup with Sphinx
6) PySpark:     Provides set up configuration for working with PySpark
8) Kedro-Viz:   Provides Kedro's native visualisation tool 

Which add-ons would you like to include in your project? [None/1-4/1,3/all]: 1,2

Example Pipeline 
================
Select whether you would like an example spaceflights pipeline included in your project.
To skip this step in the future use --example=y/n
To read more about how examples work visit: kedro.org/

Would you like to include an example pipeline? [Y\n]:Y

Congratulations! 
Your project My ML Pipeline has been created in directory /my_ml_pipeline 
You selected the following add-ons: Lint, Test 
It has been created with an example pipeline.

We received feedback on the following points: (❓flagged for further discussion)

  1. the name add-ons makes it sound more optional, which can be misleading (possible names: tools, options, utilities, plugins) ❓
  2. should we allow users to define preset groups of add-ons?
  3. adjusting the copy of the CLI wizard to better highlight that we are recommending tools for certain functionality i.e. Black/Pytest instead of just Lint/Test ❓
  4. an interactive playground/sandbox UI for generating kedro new commands in the docs. Allow people to experiment with options and preview what the project would look like.
  5. users suggested that the default state should include examples, but no add-ons, as the most likely user type to use a basic new command is a new command. ❓
  6. if the user does not select the testing add-on the example project should not contain tests. (especially because we wouldn't have installed the testing dependencies)
  7. providing a custom flag to bypass the interactive flow and apply defaults to all flags that aren't provided. this will help for programmatic creation, and also expert users who want to skip the flow. ❓
  8. the --starter flow should stay independent of the addons flow to allow users to start a project with their own custom project.
amandakys commented 1 year ago

Closed as initial design is now 'complete' and follow up work is tracked in #3054