mlopsworks / charms

WIP charms
Apache License 2.0
5 stars 3 forks source link

connect mlflow to minio #2

Open lukemarsden opened 3 years ago

lukemarsden commented 3 years ago

Add a dependency on minio to the mlflow charm: https://github.com/mlopsworks/charms/blob/main/mlflow/metadata.yaml#L14-L17

Pass the secret and minio server address to the mlflow container: https://github.com/mlopsworks/charms/blob/4dd3f7a22a076254966a383dcb5019fcf3ed37d1/mlflow/src/charm.py#L69-L72

And test that it works in the browser and by running a python script which trains a test model which publishes to mlflow - and check the model artifacts exist in minio e.g. using the minio client CLI.

lukemarsden commented 3 years ago

BTW, it might be easiest to run the test python script from inside a k8s pod in the cluster. Or, you know, a Kubeflow Jupyter notebook 😁

That way, the k8s service names will be resolvable in DNS.

egranell commented 3 years ago

When I connected MLFlow to Minio using the credentials correctly, I prepared a test script. And when launching it I get an error when trying to upload the artifacts.

$ python3 mlflow_test.py 
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
  RMSE: 0.7931640229276851
  MAE: 0.6271946374319586
  R2: 0.10862644997792614
Traceback (most recent call last):
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/boto3/s3/transfer.py", line 279, in upload_file
    future.result()
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AuthorizationHeaderMalformed) when calling the PutObject operation: The authorization header is malformed; the authorization component "Credential=bWluaW8=/20210208/us-east-1/s3/aws4_request" is malformed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mlflow_test.py", line 89, in <module>
    mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 306, in log_model
    return Model.log(
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/models/model.py", line 173, in log
    mlflow.tracking.fluent.log_artifacts(local_path, artifact_path)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 571, in log_artifacts
    MlflowClient().log_artifacts(run_id, local_dir, artifact_path)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/tracking/client.py", line 919, in log_artifacts
    self._tracking_client.log_artifacts(run_id, local_dir, artifact_path)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 287, in log_artifacts
    self._get_artifact_repo(run_id).log_artifacts(local_dir, artifact_path)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 93, in log_artifacts
    self._upload_file(
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 69, in _upload_file
    s3_client.upload_file(Filename=local_file, Bucket=bucket, Key=key, ExtraArgs=extra_args)
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/boto3/s3/inject.py", line 129, in upload_file
    return transfer.upload_file(
  File "/home/egranell/miniconda3/lib/python3.8/site-packages/boto3/s3/transfer.py", line 285, in upload_file
    raise S3UploadFailedError(
boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/tmpynyhawkp/model/conda.yaml to mlflow/0/73df6cec16bf4478956b014247e32fe1/artifacts/model/conda.yaml: An error occurred (AuthorizationHeaderMalformed) when calling the PutObject operation: The authorization header is malformed; the authorization component "Credential=bWluaW8=/20210208/us-east-1/s3/aws4_request" is malformed.

We can see that the run has been created correctly in MLFlow, but it indicates that there was an error loading the artifacts image

image

If we look at the pod logs, we see that it complains about missing fields in the request:

2021/02/08 05:16:12 ERROR mlflow.server: Exception on /ajax-api/2.0/preview/mlflow/artifacts/list [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.8/site-packages/mlflow/server/handlers.py", line 213, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/mlflow/server/handlers.py", line 484, in _list_artifacts
    artifact_entities = _get_artifact_repo(run).list_artifacts(path)
  File "/usr/local/lib/python3.8/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 110, in list_artifacts
    for result in results:
  File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__
    response = self._make_request(current_kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 332, in _make_request
    return self._method(**current_kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (MissingFields) when calling the ListObjectsV2 operation: Missing fields in request.

I thought that the error could be due to the non-existence of the bucket, but when creating it I also have the following error:

2021-02-08 04:55:18 INFO juju-log minio:29: ================================
2021-02-08 04:55:18 INFO juju-log minio:29: _on_minio_relation_changed is running; <ops.charm.RelationChangedEvent object at 0x7fb18634b790>
2021-02-08 04:55:18 INFO juju-log minio:29: ================================
2021-02-08 04:55:18 ERROR juju-log minio:29: S3 operation failed; code: MissingFields, message: Missing fields in request., resource: /mlflow, request_id: 3L137, host_id: 3L137

Could it be due to incompatibilities between the mlflow and minio versions?

egranell commented 3 years ago

The simple fact of verifying if the bucket exists, https://github.com/mlopsworks/charms/blob/main/mlflow/src/charm.py#L65 produces the following signature error:

Trace: 1: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/auth-handler.go:132:cmd.checkRequestAuthType()
       2: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/bucket-handlers.go:120:cmd.objectAPIHandlers.GetBucketLocationHandler()
       3: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/api-router.go:69:cmd.(objectAPIHandlers).GetBucketLocationHandler-fm()
       4: /opt/go/src/net/http/server.go:1918:http.HandlerFunc.ServeHTTP()
       5: /q/.q/sources/gopath/src/github.com/minio/minio/vendor/github.com/gorilla/mux/mux.go:107:mux.(*Router).ServeHTTP()
       6: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:605:cmd.rateLimit.ServeHTTP()
       7: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:563:cmd.pathValidityHandler.ServeHTTP()
       8: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:502:cmd.httpStatsHandler.ServeHTTP()
       9: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:65:cmd.requestSizeLimitHandler.ServeHTTP()
      10: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:90:cmd.requestHeaderSizeLimitHandler.ServeHTTP()
      11: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/crossdomain-xml-handler.go:51:cmd.crossDomainPolicy.ServeHTTP()
      12: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:219:cmd.redirectHandler.ServeHTTP()
      13: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:277:cmd.minioReservedBucketHandler.ServeHTTP()
      14: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:246:cmd.cacheControlHandler.ServeHTTP()
      15: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:347:cmd.timeValidityHandler.ServeHTTP()
      16: /q/.q/sources/gopath/src/github.com/minio/minio/vendor/github.com/rs/cors/cors.go:190:cors.(*Cors).Handler.func1()
      17: /opt/go/src/net/http/server.go:1918:http.HandlerFunc.ServeHTTP()
      18: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:453:cmd.resourceHandler.ServeHTTP()
      19: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/auth-handler.go:244:cmd.authHandler.ServeHTTP()
      20: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:133:cmd.reservedMetadataHandler.ServeHTTP()
      21: /q/.q/sources/gopath/src/github.com/minio/minio/pkg/http/server.go:111:http.(*Server).Start.func1()
      22: /opt/go/src/net/http/server.go:1918:http.HandlerFunc.ServeHTTP()
      23: /opt/go/src/net/http/server.go:2619:http.serverHandler.ServeHTTP()
      24: /opt/go/src/net/http/server.go:1801:http.(*conn).serve()
[2021-02-08T06:49:02.776892989Z] [ERROR] {"method":"GET","reqURI":"/mlflow?location=","header":{"Accept-Encoding":["identity"],"Authorization":["AWS4-HMAC-SHA256 Credential=bWluaW8=/20210208/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=07d026af0a2425e9c33669935d380662492315cb7edf31dc90005e70b94e9363"],"Host":["10.96.25.137:9000"],"User-Agent":["MinIO (Linux; x86_64) minio-py/7.0.1"],"X-Amz-Content-Sha256":["e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"],"X-Amz-Date":["20210208T064902Z"]}} (Signature does not match)
egranell commented 3 years ago

The problem was basically that the username and password to connect to minio should not be encoded in b64, as I believed based on the information I saw. And for the test to work with the script, we have to create the following environment variables instead of putting the credentials in the file ~/.aws / credentials:

AWS_ACCESS_KEY_ID=user
AWS_SECRET_ACCESS_KEY=key
MLFLOW_S3_ENDPOINT_URL=http://ip:port # to minio 
MLFLOW_TRACKING_URI=http://ip:port # to mlflow
egranell commented 3 years ago

Now we see the confirmation that the experiment with the artifacts has been saved:

Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
RMSE: 0.7931640229276851
MAE: 0.6271946374319586
R2: 0.10862644997792614
Successfully registered model 'ElasticnetWineModel'.
2021/02/09 09:09:02 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: ElasticnetWineModel, version 1
Created version '1' of model 'ElasticnetWineModel'.

We can see the experiment in MLFlow: imagen

And the artifacts in Minio: imagen