mlopsworks / charms

WIP charms
Apache License 2.0
5 stars 3 forks source link

connect mlflow charm to mysql #1

Closed lukemarsden closed 3 years ago

lukemarsden commented 3 years ago

Currently the charm depends on mysql, but doesn't actually have mlflow server connect to the database.

Make it do this by passing the right flags and environment variables when the mysql instance gets connected.

There is already code to get the mysql connection details:

https://github.com/mlopsworks/charms/blob/4dd3f7a22a076254966a383dcb5019fcf3ed37d1/mlflow/src/charm.py#L40-L47

It should just be a matter of plumbing it through to the mlflow container: https://github.com/mlopsworks/charms/blob/4dd3f7a22a076254966a383dcb5019fcf3ed37d1/mlflow/src/charm.py#L69-L72

egranell commented 3 years ago

According to the MLFlow documentation we can log runs remotely in a mysql server setting the MLFLOW_TRACKING_URI environment variable to a tracking server’s URI Database encoded as mysql://<username>:<password>@<host>:<port>/<database>. I added the MLFLOW_TRACKING_URI environment variable with the information received from the event opslib.mysql.MySQLDatabaseChangedEvent: https://github.com/mlopsworks/charms/blob/5f8ab801bfe8c9856c22555994cc0c75d6eaab34/mlflow/src/charm.py#L73-L83

If the information is not complete, the MLFLOW_TRACKING_URI variable is left empty and the MLFlow pod starts in local tracking mode.

egranell commented 3 years ago

I have observed a problem with the mysql server. At the beginning, a mlflow pod is started without the connection to the mysql server. When the mysql server is available the opslib.mysql.MySQLDatabaseChangedEvent event is fired, the previously started pod is stopped and another one is started with the connection to the mysql server. But shortly after another opslib.mysql.MySQLDatabaseChangedEvent event arrives with incomplete information, so the pod started with the mlsql connection stops and another one without it is started.

2021-02-03 04:53:25 INFO juju-log Running legacy hooks/install.
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:26 INFO juju-log __init__ is running
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:26 INFO juju-log in set_pod_spec; <ops.charm.InstallEvent object at 0x7f4c0e1bd8e0>
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:26 INFO juju-log MLFLOW_TRACKING_URI: mysql://None:None@None:None/None
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:26 INFO juju-log Not a leader, skipping set_pod_spec
2021-02-03 04:53:26 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "install" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:26 INFO juju.worker.caasoperator.uniter.relation statetracker.go:158 joining relation "mlflow:db mysql:mysql"
2021-02-03 04:53:26 INFO juju.worker.caasoperator.uniter.relation statetracker.go:194 joined relation "mlflow:db mysql:mysql"
2021-02-03 04:53:26 INFO juju-log ================================
2021-02-03 04:53:27 INFO juju-log __init__ is running
2021-02-03 04:53:27 INFO juju-log ================================
2021-02-03 04:53:27 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "leader-elected" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:27 INFO juju-log db:28: ================================
2021-02-03 04:53:27 INFO juju-log db:28: __init__ is running
2021-02-03 04:53:27 INFO juju-log db:28: ================================
2021-02-03 04:53:28 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "db-relation-created" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:28 INFO juju-log ================================
2021-02-03 04:53:28 INFO juju-log __init__ is running
2021-02-03 04:53:28 INFO juju-log ================================
2021-02-03 04:53:28 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "leader-settings-changed" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:28 INFO juju.worker.caasoperator initializer.go:124 started pod init on "mlflow/4"
2021-02-03 04:53:30 INFO juju-log ================================
2021-02-03 04:53:30 INFO juju-log __init__ is running
2021-02-03 04:53:30 INFO juju-log ================================
2021-02-03 04:53:30 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "config-changed" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:30 INFO juju.worker.caasoperator.uniter resolver.go:147 found queued "start" hook
2021-02-03 04:53:30 INFO juju-log Running legacy hooks/start.
2021-02-03 04:53:30 INFO juju-log ================================
2021-02-03 04:53:30 INFO juju-log __init__ is running
2021-02-03 04:53:31 INFO juju-log ================================
2021-02-03 04:53:31 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "start" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:31 INFO juju-log db:28: ================================
2021-02-03 04:53:31 INFO juju-log db:28: __init__ is running
2021-02-03 04:53:31 INFO juju-log db:28: ================================
2021-02-03 04:53:31 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "db-relation-joined" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: __init__ is running
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: Database on relation 28 available at host=ip port=port dbname=database user=user.
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: _on_database_changed is running; <opslib.mysql.MySQLDatabaseChangedEvent object at 0x7f1507958730>
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: in set_pod_spec; <opslib.mysql.MySQLDatabaseChangedEvent object at 0x7f1507958730>
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:32 INFO juju-log db:28: MLFLOW_TRACKING_URI: mysql://user:pass@ip:port/database
2021-02-03 04:53:32 INFO juju-log db:28: ================================
2021-02-03 04:53:33 INFO juju-log db:28: Not a leader, skipping set_pod_spec
2021-02-03 04:53:33 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "db-relation-changed" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:34 WARNING juju.worker.caasoperator.uniter.operation leader.go:123 we should run a leader-deposed hook here, but we can't yet
2021-02-03 04:53:34 INFO juju-log db:28: ================================
2021-02-03 04:53:34 INFO juju-log db:28: __init__ is running
2021-02-03 04:53:34 INFO juju-log db:28: ================================
2021-02-03 04:53:34 INFO juju.worker.caasoperator.uniter.operation runhook.go:142 ran "db-relation-departed" hook (via hook dispatching script: dispatch)
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: __init__ is running
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: Database relation 28 is gone.
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: _on_database_changed is running; <opslib.mysql.MySQLDatabaseChangedEvent object at 0x7ff11395a610>
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: in set_pod_spec; <opslib.mysql.MySQLDatabaseChangedEvent object at 0x7ff11395a610>
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: ================================
2021-02-03 04:53:35 INFO juju-log db:28: MLFLOW_TRACKING_URI: mysql://None:None@None:3306/None
2021-02-03 04:53:35 INFO juju-log db:28: ================================
lukemarsden commented 3 years ago

We should refuse to start the pod (don't call set_pod_spec) until we have complete connection info.

egranell commented 3 years ago

Fixed, now the pod starts only if the configuration arrives correctly