[Stress Testing] - Create example projects to assess Kedro performance for complex pipelines

Description

https://github.com/kedro-org/kedro/issues/3957 needs to be done first as pre-work.

There are several features across the Kedro organisation that could benefit from manual testing on large projects to evaluate performance. The proposal is to create several kedro projects of varying size that can be used to test and experiment with.

The example projects don't have to be "legit" as in tackling real data science problems. It can be toy examples. The main point here is creating projects with scale.

This could be particularly useful for testing Viz features, CC @rashidakanchwala, @NeroOkwa

Context

The Kedro-Viz team carried out a performance analysis using an internal QB pipeline, with preliminary results shown here https://github.com/kedro-org/kedro-viz/issues/1064

It takes a long time to initialise the Kedro modules and reach the actual kedro viz run command (already sort of known, Improve Kedro CLI startup time kedro#1476)

The expensive operation before starting the viz server is loading the data from the Kedro session (possibly related to Lazy Loading of Catalog Items kedro#2829 ?)

Most of the time taken to load the data is from catalog and pipelines_dict resolution, which worsens as the pipeline count increases

(from https://github.com/kedro-org/kedro-viz/issues/1064#issuecomment-2100489744, summary of internal report).

There is preliminary evidence that the Kedro Framework CLI is a bottleneck for Kedro Viz.

This is on top of the already existing evidence that Kedro takes a lot of time to load even for trivial commands or almost empty projects #1476

We noted that there are several factors that make a pipeline "complex":

Lots of nodes
Lots of pipelines
Lots of datasets

In I expanded on @AhdraMeraliQB's original proposal and suggested that we create a family of pipelines, comprising

An empty project (so that we can measure just "cold startup" time, essentially what I did in Kedro initialization is slow #3033 (comment))

1 pipeline with increasingly large number of nodes (essentially what @AhdraMeraliQB proposed in Create QA Kedro test projects for stress testing and performance and evaluation #3790 (comment))

N pipelines of 1 node

1 pipeline and 1 node with increasingly large number of datasets

...possibly more?

Comes from https://github.com/kedro-org/kedro/discussions/3790

^{Originally posted by **AhdraMeraliQB** January 6, 2024}

kedro-org / kedro

[Stress Testing] - Create example projects to assess Kedro performance for complex pipelines #3866

Description

Context