Open lukemarsden opened 3 years ago
BTW, it might be easiest to run the test python script from inside a k8s pod in the cluster. Or, you know, a Kubeflow Jupyter notebook 😁
That way, the k8s service names will be resolvable in DNS.
When I connected MLFlow to Minio using the credentials correctly, I prepared a test script. And when launching it I get an error when trying to upload the artifacts.
$ python3 mlflow_test.py
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
RMSE: 0.7931640229276851
MAE: 0.6271946374319586
R2: 0.10862644997792614
Traceback (most recent call last):
File "/home/egranell/miniconda3/lib/python3.8/site-packages/boto3/s3/transfer.py", line 279, in upload_file
future.result()
File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/tasks.py", line 126, in __call__
return self._execute_main(kwargs)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/tasks.py", line 150, in _execute_main
return_value = self._main(**kwargs)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/s3transfer/upload.py", line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AuthorizationHeaderMalformed) when calling the PutObject operation: The authorization header is malformed; the authorization component "Credential=bWluaW8=/20210208/us-east-1/s3/aws4_request" is malformed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mlflow_test.py", line 89, in <module>
mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/sklearn/__init__.py", line 306, in log_model
return Model.log(
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/models/model.py", line 173, in log
mlflow.tracking.fluent.log_artifacts(local_path, artifact_path)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 571, in log_artifacts
MlflowClient().log_artifacts(run_id, local_dir, artifact_path)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/tracking/client.py", line 919, in log_artifacts
self._tracking_client.log_artifacts(run_id, local_dir, artifact_path)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py", line 287, in log_artifacts
self._get_artifact_repo(run_id).log_artifacts(local_dir, artifact_path)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 93, in log_artifacts
self._upload_file(
File "/home/egranell/miniconda3/lib/python3.8/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 69, in _upload_file
s3_client.upload_file(Filename=local_file, Bucket=bucket, Key=key, ExtraArgs=extra_args)
File "/home/egranell/miniconda3/lib/python3.8/site-packages/boto3/s3/inject.py", line 129, in upload_file
return transfer.upload_file(
File "/home/egranell/miniconda3/lib/python3.8/site-packages/boto3/s3/transfer.py", line 285, in upload_file
raise S3UploadFailedError(
boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/tmpynyhawkp/model/conda.yaml to mlflow/0/73df6cec16bf4478956b014247e32fe1/artifacts/model/conda.yaml: An error occurred (AuthorizationHeaderMalformed) when calling the PutObject operation: The authorization header is malformed; the authorization component "Credential=bWluaW8=/20210208/us-east-1/s3/aws4_request" is malformed.
We can see that the run has been created correctly in MLFlow, but it indicates that there was an error loading the artifacts
If we look at the pod logs, we see that it complains about missing fields in the request:
2021/02/08 05:16:12 ERROR mlflow.server: Exception on /ajax-api/2.0/preview/mlflow/artifacts/list [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/usr/local/lib/python3.8/site-packages/mlflow/server/handlers.py", line 213, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/mlflow/server/handlers.py", line 484, in _list_artifacts
artifact_entities = _get_artifact_repo(run).list_artifacts(path)
File "/usr/local/lib/python3.8/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 110, in list_artifacts
for result in results:
File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/paginate.py", line 332, in _make_request
return self._method(**current_kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (MissingFields) when calling the ListObjectsV2 operation: Missing fields in request.
I thought that the error could be due to the non-existence of the bucket, but when creating it I also have the following error:
2021-02-08 04:55:18 INFO juju-log minio:29: ================================
2021-02-08 04:55:18 INFO juju-log minio:29: _on_minio_relation_changed is running; <ops.charm.RelationChangedEvent object at 0x7fb18634b790>
2021-02-08 04:55:18 INFO juju-log minio:29: ================================
2021-02-08 04:55:18 ERROR juju-log minio:29: S3 operation failed; code: MissingFields, message: Missing fields in request., resource: /mlflow, request_id: 3L137, host_id: 3L137
Could it be due to incompatibilities between the mlflow and minio versions?
The simple fact of verifying if the bucket exists, https://github.com/mlopsworks/charms/blob/main/mlflow/src/charm.py#L65 produces the following signature error:
Trace: 1: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/auth-handler.go:132:cmd.checkRequestAuthType()
2: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/bucket-handlers.go:120:cmd.objectAPIHandlers.GetBucketLocationHandler()
3: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/api-router.go:69:cmd.(objectAPIHandlers).GetBucketLocationHandler-fm()
4: /opt/go/src/net/http/server.go:1918:http.HandlerFunc.ServeHTTP()
5: /q/.q/sources/gopath/src/github.com/minio/minio/vendor/github.com/gorilla/mux/mux.go:107:mux.(*Router).ServeHTTP()
6: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:605:cmd.rateLimit.ServeHTTP()
7: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:563:cmd.pathValidityHandler.ServeHTTP()
8: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:502:cmd.httpStatsHandler.ServeHTTP()
9: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:65:cmd.requestSizeLimitHandler.ServeHTTP()
10: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:90:cmd.requestHeaderSizeLimitHandler.ServeHTTP()
11: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/crossdomain-xml-handler.go:51:cmd.crossDomainPolicy.ServeHTTP()
12: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:219:cmd.redirectHandler.ServeHTTP()
13: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:277:cmd.minioReservedBucketHandler.ServeHTTP()
14: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:246:cmd.cacheControlHandler.ServeHTTP()
15: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:347:cmd.timeValidityHandler.ServeHTTP()
16: /q/.q/sources/gopath/src/github.com/minio/minio/vendor/github.com/rs/cors/cors.go:190:cors.(*Cors).Handler.func1()
17: /opt/go/src/net/http/server.go:1918:http.HandlerFunc.ServeHTTP()
18: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:453:cmd.resourceHandler.ServeHTTP()
19: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/auth-handler.go:244:cmd.authHandler.ServeHTTP()
20: /q/.q/sources/gopath/src/github.com/minio/minio/cmd/generic-handlers.go:133:cmd.reservedMetadataHandler.ServeHTTP()
21: /q/.q/sources/gopath/src/github.com/minio/minio/pkg/http/server.go:111:http.(*Server).Start.func1()
22: /opt/go/src/net/http/server.go:1918:http.HandlerFunc.ServeHTTP()
23: /opt/go/src/net/http/server.go:2619:http.serverHandler.ServeHTTP()
24: /opt/go/src/net/http/server.go:1801:http.(*conn).serve()
[2021-02-08T06:49:02.776892989Z] [ERROR] {"method":"GET","reqURI":"/mlflow?location=","header":{"Accept-Encoding":["identity"],"Authorization":["AWS4-HMAC-SHA256 Credential=bWluaW8=/20210208/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=07d026af0a2425e9c33669935d380662492315cb7edf31dc90005e70b94e9363"],"Host":["10.96.25.137:9000"],"User-Agent":["MinIO (Linux; x86_64) minio-py/7.0.1"],"X-Amz-Content-Sha256":["e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"],"X-Amz-Date":["20210208T064902Z"]}} (Signature does not match)
The problem was basically that the username and password to connect to minio should not be encoded in b64, as I believed based on the information I saw. And for the test to work with the script, we have to create the following environment variables instead of putting the credentials in the file ~/.aws / credentials
:
AWS_ACCESS_KEY_ID=user
AWS_SECRET_ACCESS_KEY=key
MLFLOW_S3_ENDPOINT_URL=http://ip:port # to minio
MLFLOW_TRACKING_URI=http://ip:port # to mlflow
Now we see the confirmation that the experiment with the artifacts has been saved:
Elasticnet model (alpha=0.500000, l1_ratio=0.500000):
RMSE: 0.7931640229276851
MAE: 0.6271946374319586
R2: 0.10862644997792614
Successfully registered model 'ElasticnetWineModel'.
2021/02/09 09:09:02 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: ElasticnetWineModel, version 1
Created version '1' of model 'ElasticnetWineModel'.
We can see the experiment in MLFlow:
And the artifacts in Minio:
Add a dependency on minio to the mlflow charm: https://github.com/mlopsworks/charms/blob/main/mlflow/metadata.yaml#L14-L17
Pass the secret and minio server address to the mlflow container: https://github.com/mlopsworks/charms/blob/4dd3f7a22a076254966a383dcb5019fcf3ed37d1/mlflow/src/charm.py#L69-L72
And test that it works in the browser and by running a python script which trains a test model which publishes to mlflow - and check the model artifacts exist in minio e.g. using the minio client CLI.