sematic-ai / sematic

An open-source ML pipeline development platform
Other
969 stars 58 forks source link

Properties which were dynamically set may get reset for rerun-from-here/restarted runner #1032

Open augray opened 1 year ago

augray commented 1 year ago

The logic for rerun-from-here and runner reentrance rely on being able to reconstruct a future graph from the runs in the DB. Part of this is setting the properties on the Future using properties of the Run. Unfortunately, not all properties stored on a Future object are stored on the Run. Most of the time, things will work anyway, because the reconstruction code uses the function to instantiate the future. In pseudo-code:

def future_from_run(run):
    function = import_function(run.function_path)
    kwargs = get_kwargs(get_artifacts(run), function)
    future = function(**kwargs)
    # ...
    return future

Thus if the function sets a property (ex: retry), the recreated future will have it:

# when recreating a future from a run with this function, `retry`
# will be instantiated to MY_RETRY_SETTINGS.

@func(retry=MY_RETRY_SETTINGS)
def foo(some_arg: ArgType) -> int:
    # ...

However, if the pipeline is such that the property is set dynamically, and the property is not stored in the run, we will not be able to recreate it:

@func
def pipeline(some_arg: ArgType) -> int:

    # if the pipeline's runner is restarted during the execution of
    # `foo`, or we do a 'rerun from here' for `foo`,
    # we will not see the override to its retry settings.
    intermediate = foo(some_arg).set(
        retry=RETRY_SETTINGS if some_arg.is_risky() else None
    )
    # ...
    return result

Several properties are impacted by this issue: