import-run fails when there are duplicate metric values with SQL backend store

garymm commented 1 year ago

Running import-run with the attached directory succeeds with MLFLOW_TRACKING_URI set to some local directory, but fails when it's set to a tracking server that is backed by SQL.

I get this error:

Options:
  input_dir: /tmp/mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a
  experiment_name: instinct
  mlmodel_fix: True
  use_src_user_id: False
  dst_notebook_dir: None
  dst_notebook_dir_add_run_id: None
in_databricks: False
importing_into_databricks: False
Importing run from '/tmp/mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a'
Traceback (most recent call last):
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 70, in _import_run
    self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 105, in _import_run_data
    run_data_importer.log_metrics(self.mlflow_client, run_dct, run_id, MAX_METRICS_PER_BATCH)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 38, in log_metrics
    _log_data(run_dct, run_id, batch_size, get_data, log_data)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 19, in _log_data
    log_data(run_id, batch)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 37, in log_data
    client.log_batch(run_id, metrics=metrics)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/client.py", line 1099, in log_batch
    self._tracking_client.log_batch(run_id, metrics, params, tags)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/_tracking_service/client.py", line 415, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 341, in log_batch
    self._call_endpoint(LogBatch, req_body)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 57, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 280, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 206, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(pymysql.err.IntegrityError) (1062, "Duplicate entry 'inside_3-16686312949540-0-557f04b0a45c4486b067a7e4205bb1b9-0-1' for key 'metrics.PRIMARY'")
[SQL: INSERT INTO metrics (`key`, value, timestamp, step, is_nan, run_uuid) VALUES (%(key)s, %(value)s, %(timestamp)s, %(step)s, %(is_nan)s, %(run_uuid)s)]
[parameters: ({'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994150, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994160, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994170, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994180, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994190, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0166667, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.016, 'timestamp': 16686312994210, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}  ... displaying 10 of 891 total bound parameter sets ...  {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949680, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949690, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
Traceback (most recent call last):
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 70, in _import_run
    self._import_run_data(src_run_dct, run_id, src_run_dct["info"]["user_id"])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/import_run.py", line 105, in _import_run_data
    run_data_importer.log_metrics(self.mlflow_client, run_dct, run_id, MAX_METRICS_PER_BATCH)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 38, in log_metrics
    _log_data(run_dct, run_id, batch_size, get_data, log_data)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 19, in _log_data
    log_data(run_id, batch)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow_export_import/site-packages/mlflow_export_import/run/run_data_importer.py", line 37, in log_data
    client.log_batch(run_id, metrics=metrics)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/client.py", line 1099, in log_batch
    self._tracking_client.log_batch(run_id, metrics, params, tags)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/tracking/_tracking_service/client.py", line 415, in log_batch
    self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 341, in log_batch
    self._call_endpoint(LogBatch, req_body)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/store/tracking/rest_store.py", line 57, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 280, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/garymm/src/Astera-org/obelisk/bazel-bin/external/pip_mlflow_export_import/rules_python_wheel_entry_point_import-run.runfiles/pip_mlflow/site-packages/mlflow/utils/rest_utils.py", line 206, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(pymysql.err.IntegrityError) (1062, "Duplicate entry 'inside_3-16686312949540-0-557f04b0a45c4486b067a7e4205bb1b9-0-1' for key 'metrics.PRIMARY'")
[SQL: INSERT INTO metrics (`key`, value, timestamp, step, is_nan, run_uuid) VALUES (%(key)s, %(value)s, %(timestamp)s, %(step)s, %(is_nan)s, %(run_uuid)s)]
[parameters: ({'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994150, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994160, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994170, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994180, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994190, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.0166667, 'timestamp': 16686312994200, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_2', 'value': 0.016, 'timestamp': 16686312994210, 'step': 0, 'is_nan': 0, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}  ... displaying 10 of 891 total bound parameter sets ...  {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949680, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'}, {'key': 'inside_3', 'value': 0, 'timestamp': 16686312949690, 'step': 0, 'is_nan': 1, 'run_uuid': '557f04b0a45c4486b067a7e4205bb1b9'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

mlflow-export-9198135f4a4c40ccb76a8c2ae8c61d8a.tar.gz

garymm commented 1 year ago

I guess this is an issue with MLFlow behaving differently with different backend stores and not the import. In theory the import could try to work around this by de-duplicating metric values when exporting and / or importing. I'll try to do that on my own before exporting.