Open almajo opened 1 year ago
end_time is a strictly read-only value set by the system. start_time is also set by the system with an option in create_run() to pass in a time though w/o being able to set start_time this doesn't make sense. What kwargs and what methods are you refering to?
You can still access the original start_time and end_time values if you use the import option --import-source-tags. It will save the values as tags in the destination run such as "mlflow_exim.run_info.start_time". See https://github.com/mlflow/mlflow-export-import/blob/master/README_governance.md.
def _import_run(self, dst_exp_name, input_dir, dst_notebook_dir):
exp_id = mlflow_utils.set_experiment(self.mlflow_client, self.dbx_client, dst_exp_name)
exp = self.mlflow_client.get_experiment(exp_id)
src_run_path = os.path.join(input_dir,"run.json")
src_run_dct = io_utils.read_file_mlflow(src_run_path)
run = self.mlflow_client.create_run(exp.experiment_id)
run_id = run.info.run_id
try:
self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
path = os.path.join(input_dir, "artifacts")
if os.path.exists(_filesystem.mk_local_path(path)):
self.mlflow_client.log_artifacts(run_id, mk_local_path(path))
if self.mlmodel_fix:
self._update_mlmodel_run_id(run_id)
self.mlflow_client.set_terminated(run_id, RunStatus.to_string(RunStatus.FINISHED))
run = self.mlflow_client.get_run(run_id)
except Exception as e:
self.mlflow_client.set_terminated(run_id, RunStatus.to_string(RunStatus.FAILED))
import traceback
traceback.print_exc()
raise MlflowExportImportException from e
if utils.importing_into_databricks() and dst_notebook_dir:
ndir = os.path.join(dst_notebook_dir, run_id) if self.dst_notebook_dir_add_run_id else dst_notebook_dir
self._upload_databricks_notebook(input_dir, src_run_dct, ndir)
return (run, src_run_dct["tags"].get(MLFLOW_PARENT_RUN_ID,None))
I'm referring to passing src_run_dct["info"]["start_time"]
to the create_run
method for the start and passing src_run_dct["info"]["end_time"]
to the kwarg end_time
of .set_terminated(..., end_time=XX)
I'm aware that I can transfer those as extra tags with the CLI option, however, I prefer having an ordered list in my experiment overview (ordered by start_date).
I believe what @almajo requested would be the expected behavior in a lot of cases, such as transfering all my data from a filesystem to a SQL database.
Since the functions create_run
and set_terminated
already accept start_time
and end_time
as respective argument, why not add an option for using the imported times (that are already stored when exporting) instead of the system times?
I'm with @jrakotobe and @almajo. This bug corrupts your data bug which seems like a major issue for the official import/export tool, and it locks people into whatever backend they choose at installation time.
@amesar is there anything that would prevent a fix PR like @almajo's from being merged? I'm happy to send one if no one else wants to take on it.
Currently start_date and end_date of the exported runs is not used. Instead, the run is created with the current timestamp. This messes the whole imported library up.
A fix is simple as we only need to pass the kwargs to the client methods that are used anyway.