kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
663 stars 110 forks source link

[Spike] Investigate the migration away from redux state logic for the Kedro-Viz flowchart #732

Closed studioswong closed 11 months ago

studioswong commented 2 years ago

Introduction

The successful setup and transition towards data-fetching with Graphql and state management with react hooks within the experiment-tracking features resulted in a new set of clean and highly readable code that are highly reusable ( compared to the redux setup that is highly entangled with the redux store.)

With third party consumption of the Kedro-Viz flowchart being one of the most discussed use case for Kedro-Viz, this poses the benefits of refactoring the existing redux setup for the flowchart towards local state management with graphql for data fetching. Yet, the current logic of the graph calculations are tightly entangled within the selectors within the redux setup which poses a series of challenges for the refactoring work that requires careful investigation.

This issue is to investigate and explore the different stages for conducting this work in migrating away from the redux setup.

Background

Redux has been utilized heavily in the state management and data ingestion of the flowchart (see this section in the architecture docs) . All local state of the app, as well as the logic of the calculation of flowchart nodes, modular pipeline tree and input to the layout engine, are all managed and tightly coupled within the selectors in the redux setup.

Why isn't the redux setup suitable anymore?

The redux setup poses the following problems:

  1. This tightly coupled setup of in-app logic within selectors and the redux state makes it impossible to extract the flowchart, node-list sidebar and all related components for reusability as individual components outside the context of Kedro-Viz, as a local store will always need to be included with the component.

  2. The redux setup introduces an excessive amount of code and complexity into the codebase - the set of code to initialisation the set up of the local store, the need to introduce a series of actions, reducers, and selectors, as well as the additional amount of code within the components to consume the local store, all introduces the unnecessary effort to maintain excessive amount of code down the line. ( This is very apparent compared to the amount of set up code with graphql in the new experiment tracking features, which only requires the simple apollo client setup.)

  3. It is never good to have two drastically different data ingestion and state management protocols within the same codebase. For sake of simplicity and maintenance down the line, we will have to migrate away from this one way or another.

Design

Before designing a solution, here are the set of challenges specific to the current state of the app that we need to consider in the design:

Data Ingestion

Currently the data ingestion from the rest endpoint are all handled by the redux setup, with the huge object containing all nodes and edges returned from the /main endpoint for rendering the flowchart all broken down into subsequent data fields used in the calculation of the flowchart. The new solution must be able to replace Redux's function in the breakdown, sorting and update of the flowchart data in real time as the user switches between pipelines.

It will also need to replace Redux's function in the handling of reading from and updates to the localStorage.

Global State management

All global app state setup, such as theme and selected nodes, are tied in to the redux store - those states will need to be striped out and re-set up within the app, either as a react hook, or using the Context API.

Component Prop and state management

The consumption of data for components ( such as nodes, edges, themes, etc) are all reliant on the redux store - the new setup will require refactoring of all components to use the new data ingestion and local/global state setup.

One important point is that our current architecture for the flowchart page is very tightly coupled, where the component set up is not very scoped for reusability ( i.e it is being set up with a lot of custom setup specifically for Kedro-Viz) - this refactoring work would pose great opportunity to reconsider and refactor those components better suited for reusability in a different context.

Logic Calculation

The flowchart itself, and the control sidebar of the flowchart (node-list components) had its logic deeply nested within the selector setup in utilizing this to initiate recalculation on updates of global states from user selection. Stripping it away from the redux setup would mean a total rewrite of the logic in pure JS functions, as well as setting up new hooks within the components to initiate the recalculation on app state updates.

As a result of the above 4 challenges, here are some of the key concepts that will be adopted in the design to solve the issues above

Here are some of the core concepts that will be adopted in the design:

Key concept 1: Refactor existing data ingestion layer and calculation logic into GraphQL API layer

The easiest and least disruptive way is to set up a graphql API layer that sits on top of the legacy Rest API, with it replacing the data ingestion layer within the current Redux setup. The graphql API layer will also contain the selector logic in the form of resolvers in providing data in the required format by the individual components.

This arrangement allows the separation of concern in moving the data logic away from the app into a separate layer that handles all logic calculation, allowing us to move towards a more loosely coupled FE architecture of UI components and calculation logic.

Key concept 2: Utilisation of the GraphQL Client and Apollo Cache for global state management via Reactive Variables

One of the key advantages of the redux store setup is the ability to set up global state variables that will trigger real time updates via the dispatch of actions. Within the Apolllo client setup, this can be achieved via setting up Reactive variables for global state management ( such as states to indicate the ‘selected node’, ‘clicked node’, ‘hovered node’, etc)

Updating the reactive variable will trigger the update of the apollo calculation, and in turn trigger the apollo client and cache to update the set of related data according to the reactive variable, similar to the dispatch of actions within the redux setup.

Key concept 3: Establish a direct mapping of data requirement of selectors into graphql queries within UI components

The current UI components are set up to ingest data from selectors; each selector could be mapped directly into a graphql query, with the logic within the selector to be implemented within the resolvers in the graphql api layer.

The following diagram illustrates the new architecture with the implementation of the above three key concepts:

app-data-flow-graphql Diagram depicting the new data flow via the GraphQL API layer

kedro-viz-architecture-graphql (1) Diagram depicting the new app architecture

In the meantime, please refer to our architecture docs for the existing data ingestion and architecture setup for your comparison.

Alternatives considered

The alternative is to replace the REST API directly with a graphql endpoint.

However, that is not desirable given the following reasons:

  1. Stripping the data ingestion logic away from the front end into the backend will imply wasted efforts in reinventing the wheel given the massive effort required for a complete rewrite of the data ingestion and selector logic from JS code into python code.
  2. The logic of the flowchart calculation dictates the need of the front end to control the data input (i.e graph nodes) into the flowchart component, which currently sits within the redux selector setup.
  3. Most importantly, replacing the existing rest API setup would also mean losing the current benefits of generating a Kedro-Viz visualisation via a sharable JSON file directly on from the Kedro project, which is a widely popular feature adopted within our existing users ( not to say incompatibility with Kedro-Viz as an imported react component.)

Other than setting up reactive variables, we can also rely on the use of react hooks or the context API for app state management.

Rollout strategy

Given the complex and reliance on the redux setup, the core idea of the implementation is to slowly strip away the reliance of the UI component to obtain data via props fed by selector methods.

Milestone 1: Graphql API layer and Apollo Client Setup

This milestone mainly focuses on setting up the Graphql API layer to ingest the JSON data object into meaningful format ( groups of nodes and edges) as consumed by the app. This entails the set up of basic resolvers and schema to return a fixed set of nodes and edges to simulate the data returned by the basic selectors.

This also requires configuration of the apollo client cache to allow it to read and write from the localStorage and connect with the webworker.

Milestone 2: Migration of selector logic of core UI components into GraphQL API layer

This milestone will focus on migrating the existing core logic of the selector setup heavily utilized in the node-list and flowchart component into reducers within the graphql API layer.

This will also involve setting up related queries to structure the data requirements for UI components as fulfilled previously by the selectors.

Milestone 3: Set up of Reactive Variables according to global app states; migration of usage of global states within UI components with reactive variables

As the title states, this milestone will mainly focus on migrating the global states via the set up of reactive variables, slowly stripping away the reliance of UI components on the redux store.

Milestone 4: Removing Redux Store and all related setup

After having stripped away all selectors and global states within the redux store, it is safe to completely strip out all redux store setup to fully migrate to the new architecture.

This will leave us with a cleaner and highly readable codebase, with better separation of concerns and a loosely coupled architecture that allows adaptability and reusability in enabling faster development down the line.

The change is entirely backwards compatible given that the mechanisms for data input via the REST API or a JSON data file remains the same, and all changes still sits within the context of the Front End.

antonymilne commented 2 years ago

I understand almost none of this so am really not qualified to comment, but what I can tell is that a huge amount of time, effort and thought went into it! Incredible work 😮 🎉 ⭐

Two small questions from a total layperson, just out of curiosity:

This tightly coupled setup of in-app logic within selectors and the redux state makes it impossible to extract the flowchart, node-list sidebar and all related components for reusability as individual components outside the context of Kedro-Viz, as a local store will always need to be included with the component.

Are there examples where people want to do this? I have heard about us integrating plugins into Kedro-Viz but never heard of doing it the other way round and integrating parts of Kedro-Viz into other things.

Stripping the data ingestion logic away from the front end into the backend will imply wasted efforts in reinventing the wheel given the massive effort required for a complete rewrite of the data ingestion and selector logic from JS code into python code.

What sort of thing are these selectors doing?

tynandebold commented 11 months ago

Closing this for now, knowing that it's here and we can come back to it.