Open noklam opened 11 months ago
Muhammed Afnas 12:03 PM hi everyone, can we initiate multiple sessions in kedro? if yes, could anyone help me with it? kedro version - 0.18 i am building a web application where in i have to trigger the different pipelines of a kedro project based on button clicks on the dash ui. as of now, individually it is working, but when one session is running, if i tries to trigger another session it gives a runtime error.
Experiment-tracking is not a core feature of kedro (but kedro-viz), is there other obivous reason that we need to protect session_id from running twice?
I recall there's some issue about session_id
that @datajoely identified in his research. Maybe it's related?
That's more related to orchestration and it requires a way to pass a unique identifier when the run is spread to multiple KedroSession
session_id
is used for versioning too which is why it needs to be alphabetically sortable
Arguably if we kept a private session_id and exposed a parameterisable one that would be sufficient
Uh, we're sorting by session_id? Maybe we should store the datetime instead, but this might be a bit of a digression.
The session_id
was the Versioning ID way back when - @merelcht @idanov can provide more context here
Moving this to the Session milestone.
Quotes
Description
As I have many development work with IPython or Jupyter, often I want to make small changes to test if it works.
%reload_kedro
could be quite slow and the developing experience is frustrating because for every change .This also potentially related to #1853, #2134, #2182
kedro ipython
take > 20s to start and%reload_kedro
takesContext
After this PR,
session
can only be run once. The easiest way to create a new session is%reload_kedro
. While%reload_kedro
works, it is considerably slow with big project for a few reasons:session
,context
,pipelines
,catalog
.What's the minimal effort to recreate session?
If we look into the code, there is a
self._run_called
attribute and everytime we dosession.run
it will check if it isTrue
. https://github.com/kedro-org/kedro/blob/6913acdfd55898f956b6d91fc4602fbdb011a5d1/kedro/framework/session/session.py#L434-L438https://github.com/kedro-org/kedro/blob/6913acdfd55898f956b6d91fc4602fbdb011a5d1/kedro/framework/session/session.py#L366-L371
Why do we need this check? Mainly because of
session_id
need to be a unique value, otherwise it can cause error in experiment tracking (kedro-viz) because it need to be a unique id. If we simply overridesession._run_called = False
and dosession.run()
, almost everything will work.Experiment-tracking is not a core feature of kedro (but kedro-viz), is there other obivous reason that we need to protect session_id from running twice?
(edited) It could be related to the timestamp for saving versioned data. However, it's unclear to me because
catalog
getsave_version
fromsession_id
, but there is another function that you can find in most dataset implementation.save_version = self.resolve_save_version()
Possible Implementation
Source: https://github.com/kedro-org/kedro/issues/1551#issue-1239180609
Maybe implement a
session.clear()
,session.reset()
methodPossible Alternatives
reload_kedro
so the overhead is insignificant.session._run_called
checks