kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
646 stars 106 forks source link

Conditionally move session store and stats file to .viz directory #1915

Closed ravi-kumar-pilla closed 1 week ago

ravi-kumar-pilla commented 1 month ago

Description

Resolves #1891

Development notes

VIZ_SESSION_STORE_ARGS = {"path": ".viz"}
VIZ_METADATA_ARGS = {"path": ".viz"}

QA notes

NOTE:

  1. Kedro starters will not be modified in this PR.
  2. Users need to copy their session_store.db file manually to .viz folder in-case they want to remove SESSION_STORE_ARGS from their settings.py file and would like to have the session_store.db file created under .viz folder for future kedro runs. They can ignore this if they do not wish to either use .viz folder or do not want historic kedro run details in experiment tracking.

Checklist

ravi-kumar-pilla commented 1 week ago

Hi @noklam ,

Is this a breaking change? how should this release with the change of #1951 in starters?

The changes in this PR will not break anything for the starters. These are defaults that are introduced in-case users do not provide them in settings.py file. I am only concerned with the loss of historic ET data in the below case (might be a minor thing) -

  1. Upgraded kedro-viz==9.2.0
  2. Removed SESSION_STORE_ARGS from settings.py
  3. Do a kedro run

This will create a session_store.db file in <root>/.viz which does not contain the previous runs.

We need to consider the scenarios below:

  1. Old Kedro starter user ( i.e. 0.19.0) + kedro-viz 9.2.0 - Pass
  2. Old Kedro starter user(i.e. 0.19.0) + kedro-viz 9.1.0 (no change, this is the current sitution)
  3. New Kedro starter user(i.e. 0.19.7) + Kedro-viz 9.2.0 (it should work nicely) - Pass
  4. New Kedro starter user(ie.. 0.19.7) + old Kedro-viz user (maybe the solution is we need to set a lower bound in the starter kedro-viz>=9.2.0? - Yes, with new Kedro starter which removes SESSION_STORE_ARGS, kedro run will fail with the below error -
ValueError: 
(sqlite3.OperationalError) unable to open database file

The reason for the above error might be because we did not create a parent folder for the default path if the folder does not exist.

Other questions I have, I see that path is currently set as .viz, what happens when viz is not run from the root directory, would it produce the artifact in the correct directory?

Yes, I appended the _find_kedro_project() return value to the path. If you see the code, the returned location is - f"{kedro_project_path}/{VIZ_SESSION_STORE_ARGS['path']}/session_store.db"

However lets say the user goes into the sub-directory and has also provided SESSION_STORE_ARGS , this PR will not create the file at the root, as it relies on the path sent from Kedro. Let me know if you need more information.

Thank you

noklam commented 1 week ago

The changes in this PR will not break anything for the starters. These are defaults that are introduced in-case users do not provide them in settings.py file. I am only concerned with the loss of historic ET data in the below case (might be a minor thing) -

Yes bu

ValueError: (sqlite3.OperationalError) unable to open database file

Is this a bug? I don't expect this to happen.

Yes, I appended the _find_kedro_project() return value to the path. If you see the code, the returned location is - f"{kedro_project_path}/{VIZ_SESSION_STORE_ARGS['path']}/session_store.db"

However lets say the user goes into the sub-directory and has also provided SESSION_STORE_ARGS , this PR will not create the file at the root, as it relies on the path sent from Kedro. Let me know if you need more information.

They should have the same behavior, what makes the user define argument different from the default? Maybe we could make this clear that it should be relative to the root directory. In caes user provide an absolute path, then we should respect it and read/create the session.db in the user-defined location.

noklam commented 1 week ago

We also need some migration documentations.

For existing viz user, this PR should not change anything. If they want to move it to .viz, we can advise them to remove the settings.py or specify it in .viz.

For new user - we will pin kedro-viz >=9.2.0 in kedro-starter to make sure the default will be .viz. What happen if settings is empty and kedro-viz <9.2.0? Is there a default location or kedro-viz will simply error out?

ravi-kumar-pilla commented 1 week ago

ValueError: (sqlite3.OperationalError) unable to open database file

Is this a bug? I don't expect this to happen.

Seems like it. We never got across this because
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")} the parent folder was always present

They should have the same behavior, what makes the user define argument different from the default? Maybe we could make this clear that it should be relative to the root directory. In caes user provide an absolute path, then we should respect it and read/create the session.db in the user-defined location.

Yes, we do respect what user is sending. Please have a look at the code in package/kedro_viz/integrations/kedro/sqlite_store.py

Maybe we could make this clear that it should be relative to the root directory.

This seems a bit contradictory to the absolute path you mentioned. You can have a quick look at the code -

    @property
    def location(self) -> str:
        """Returns location of the sqlite_store database"""
        if "path" not in settings.SESSION_STORE_ARGS:
            kedro_project_path = _find_kedro_project(Path.cwd()) or self._path
            return _get_session_path(
                f"{kedro_project_path}/{VIZ_SESSION_STORE_ARGS['path']}/session_store.db"
            )

        return _get_session_path(f"{self._path}/session_store.db")

Thank you

ravi-kumar-pilla commented 1 week ago

We also need some migration documentations.

For existing viz user, this PR should not change anything. If they want to move it to .viz, we can advise them to remove the settings.py or specify it in .viz.

  1. stats.json will be moved to .viz by default (should we do something like delete stats.json if present at root as we created it before, so like a clean up ) ?
  2. session_store.db is an opt in change which can be documented. I added some information in the docs. But let me know if you feel we need to add it in other places.

For new user - we will pin kedro-viz >=9.2.0 in kedro-starter to make sure the default will be .viz. What happen if settings is empty and kedro-viz <9.2.0? Is there a default location or kedro-viz will simply error out?

If you mean settings.py does not contain SESSION_STORE_CLASS, there won't be any error and viz do not have a session store initialized, so ET will be empty. But lets say, if the user has provided the below in settings.py -

from kedro_viz.integrations.kedro.sqlite_store import SQLiteStore
SESSION_STORE_CLASS = SQLiteStore

without mentioning SESSION_STORE_ARGS , then it will error out.

In BaseSessionStore, we have - store_args.setdefault("path", (self._project_path / "sessions").as_posix()) so the default path is ./sessions and if the user does not have the dir sessions at project path, kedro run will error out (Indeed a bug)

As I mentioned in this comment we did not had mkdir - session_file_path.parent.mkdir(parents=True, exist_ok=True). Lets say sessions/session_store.db is the path, if we did not had sessions dir, it will error out.

Lets connect if this is not clear. Thank you