zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
3.97k stars 431 forks source link

[BUG]: Failed to load model: File /mnt/models/model_metadata.yaml is outside of artifact store bounds #2763

Closed Nikitala0014 closed 1 month ago

Nikitala0014 commented 4 months ago

Contact Details [Optional]

TG @welldoesnotmatter

System Information

zenml==0.57.1 kubernetes>=21.7,<26

What happened?

I'm trying to run a pipeline for model deployment using Seldon as a model deployer and native Kubernets for orchestration. Initially they had different versions of the Kubernetes client, which I corrected for myself. Now I have two steps in the pipeline, loading the model and Seldon deploying this model. Seldon deployment is launched and pod has "running" status. But inside this pod the following error occurs: Failed to load model: File /mnt/models/model_metadata.yaml is outside of artifact store bounds /app/.zenconfig/local_stores/cf43b052-f85b-45ef-8c9a-325c24afc26 I tried to track it down, and as far as I understand, this happens when the artifact store tries to read the yaml - 845 line "zenml.artifacts.utils.py"

Reproduction steps

This is what my simple pipeline looks like:

from zenml import pipeline, step
from zenml.config import DockerSettings
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from yc_cloud.integrations.seldon.steps import seldon_custom_model_deployer_step
from yc_cloud.integrations.seldon.services import SeldonDeploymentConfig
from yc_cloud.integrations.seldon.seldon_client import SeldonResourceRequirements
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

# Step to load the model 
@step
def load_model_step():
    iris = load_iris()
    X, y = iris.data, iris.target
    model = LogisticRegression(
        random_state=42, max_iter=200
    ).fit(X, y)

    return model

# Define the ZenML pipeline
@pipeline(
    settings={
        "docker": DockerSettings(
            requirements="requirements_simple.txt",
            apt_packages=["git"]
        ),
    }
)
def deploy_model_pipeline():
    """Pipeline to deploy a simple logistic regression model for inference."""
    model = load_model_step()
    seldon_custom_model_deployer_step(
        model=model,
        predict_function="custom_inference.custom_predict_simple.custom_predict",  # Path to custom predict function
        service_config=SeldonDeploymentConfig(
            model_name="logistic_regression_model_v2",
            replicas=1,
            implementation="SKLEARN_SERVER",
            resources=SeldonResourceRequirements(
                limits={},
            ),
            serviceAccountName="default",
        )
    )

if __name__ == '__main__':
    deploy_model_pipeline()

Relevant log output

kubectl logs seldon-7956070e12644ced739fb0ba794ed162-54ff5f8d4f-89c76 -n zenml
Defaulted container "classifier" out of: classifier, seldon-container-engine, classifier-model-initializer (init)
2024-06-08 12:13:22,895 - seldon_core.microservice:main:578 - INFO:  Starting microservice.py:main
2024-06-08 12:13:22,895 - seldon_core.microservice:main:579 - INFO:  Seldon Core version: 1.18.2
2024-06-08 12:13:22,897 - seldon_core.microservice:main:602 - INFO:  Parse JAEGER_EXTRA_TAGS []
2024-06-08 12:13:22,897 - seldon_core.microservice:load_annotations:176 - INFO:  Found annotation kubernetes.io/config.seen:2024-06-08T12:13:12.012062339Z 
2024-06-08 12:13:22,897 - seldon_core.microservice:load_annotations:176 - INFO:  Found annotation kubernetes.io/config.source:api 
2024-06-08 12:13:22,897 - seldon_core.microservice:load_annotations:176 - INFO:  Found annotation prometheus.io/path:/prometheus 
2024-06-08 12:13:22,897 - seldon_core.microservice:load_annotations:176 - INFO:  Found annotation prometheus.io/scrape:true 
2024-06-08 12:13:22,897 - seldon_core.microservice:main:605 - INFO:  Annotations: {'kubernetes.io/config.seen': '2024-06-08T12:13:12.012062339Z', 'kubernetes.io/config.source': 'api', 'prometheus.io/path': '/prometheus', 'prometheus.io/scrape': 'true'}
2024-06-08 12:13:22,897 - seldon_core.microservice:main:613 - INFO:  Importing submodule ['yc_cloud.integrations.seldon.custom_deployer.zenml_custom_model', 'ZenMLCustomModel']
2024-06-08 12:13:25,049 - seldon_core.microservice:main:640 - INFO:  REST gunicorn microservice running on port 9000
REST gunicorn microservice running on port 9000
2024-06-08 12:13:25,051 - seldon_core.microservice:main:655 - INFO:  REST metrics microservice running on port 6000
REST metrics microservice running on port 6000
2024-06-08 12:13:25,051 - seldon_core.microservice:main:665 - INFO:  Starting servers
Starting servers
2024-06-08 12:13:25,052 - seldon_core.microservice:start_servers:80 - INFO:  Using standard multiprocessing library
Using standard multiprocessing library
2024-06-08 12:13:25,057 - seldon_core.microservice:server:432 - INFO:  Gunicorn Config:  {'bind': '0.0.0.0:9000', 'accesslog': None, 'loglevel': 'info', 'timeout': 5000, 'threads': 1, 'workers': 1, 'max_requests': 0, 'max_requests_jitter': 0, 'post_worker_init': <function post_worker_init at 0x7fc0563b6170>, 'worker_exit': functools.partial(<function worker_exit at 0x7fc0563b7910>, seldon_metrics=<seldon_core.metrics.SeldonMetrics object at 0x7fc056159360>), 'keepalive': 2}
2024-06-08 12:13:25,057 - seldon_core.microservice:server:504 - INFO:  GRPC Server Binding to 0.0.0.0:9500 with 1 processes.
2024-06-08 12:13:25,062 - seldon_core.wrapper:_set_flask_app_configs:225 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': None, 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': None, 'JSON_SORT_KEYS': None, 'JSONIFY_PRETTYPRINT_REGULAR': None, 'JSONIFY_MIMETYPE': None, 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
GRPC Server Binding to 0.0.0.0:9500 with 1 processes.
Gunicorn Config:  {'bind': '0.0.0.0:9000', 'accesslog': None, 'loglevel': 'info', 'timeout': 5000, 'threads': 1, 'workers': 1, 'max_requests': 0, 'max_requests_jitter': 0, 'post_worker_init': <function post_worker_init at 0x7fc0563b6170>, 'worker_exit': functools.partial(<function worker_exit at 0x7fc0563b7910>, seldon_metrics=<seldon_core.metrics.SeldonMetrics object at 0x7fc056159360>), 'keepalive': 2}
App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': None, 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': None, 'JSON_SORT_KEYS': None, 'JSONIFY_PRETTYPRINT_REGULAR': None, 'JSONIFY_MIMETYPE': None, 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
2024-06-08 12:13:25,068 - seldon_core.wrapper:_set_flask_app_configs:225 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': None, 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': None, 'JSON_SORT_KEYS': None, 'JSONIFY_PRETTYPRINT_REGULAR': None, 'JSONIFY_MIMETYPE': None, 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': None, 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': None, 'JSON_SORT_KEYS': None, 'JSONIFY_PRETTYPRINT_REGULAR': None, 'JSONIFY_MIMETYPE': None, 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
[2024-06-08 12:13:25 +0000] [22] [INFO] Starting gunicorn 20.1.0
[2024-06-08 12:13:25 +0000] [22] [INFO] Listening at: http://0.0.0.0:6000 (22)
[2024-06-08 12:13:25 +0000] [22] [INFO] Using worker: sync
2024-06-08 12:13:25,091 - seldon_core.microservice:_run_grpc_server:466 - INFO:  Starting new GRPC server with 1 threads.
Starting new GRPC server with 1 threads.
[2024-06-08 12:13:25 +0000] [29] [INFO] Booting worker with pid: 29
[2024-06-08 12:13:25 +0000] [9] [INFO] Starting gunicorn 20.1.0
[2024-06-08 12:13:25 +0000] [9] [INFO] Listening at: http://0.0.0.0:9000 (9)
[2024-06-08 12:13:25 +0000] [9] [INFO] Using worker: sync
[2024-06-08 12:13:25 +0000] [38] [INFO] Booting worker with pid: 38
2024-06-08 12:13:25,112 - seldon_core.gunicorn_utils:load:103 - INFO:  Tracing not active
Tracing not active
self.model_uri /mnt/models
start load model from metadata
Initializing the ZenML global configuration version to 0.57.1
self.model_uri /mnt/models
start load model from metadata
Creating database tables
Creating database tables
Failed to load model: Error initializing sql store with URL 'sqlite:////app/.zenconfig/local_stores/default_zen_store/zenml.db': (sqlite3.OperationalError) table user already exists
[SQL: 
CREATE TABLE user (
        description TEXT, 
        id CHAR(32) NOT NULL, 
        created DATETIME NOT NULL, 
        updated DATETIME NOT NULL, 
        name VARCHAR NOT NULL, 
        is_service_account BOOLEAN NOT NULL, 
        full_name VARCHAR NOT NULL, 
        email VARCHAR, 
        active BOOLEAN NOT NULL, 
        password VARCHAR, 
        activation_token VARCHAR, 
        hub_token VARCHAR, 
        email_opted_in BOOLEAN, 
        external_user_id CHAR(32), 
        is_admin BOOLEAN NOT NULL, 
        user_metadata VARCHAR, 
        PRIMARY KEY (id), 
        UNIQUE (name, is_service_account)
)

]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Creating default workspace 'default' ...
Creating default stack in workspace default...
Setting the global active workspace to 'default'.
Setting the global active stack to default.
The current repo active workspace is no longer available. Resetting the active workspace to 'default'.
The current repo active stack is no longer available. Resetting the active stack to default.
Reloading configuration file /app/.zen/config.yaml
artifact_versions_by_uri index=1 max_size=20 total_pages=1 total=0 items=[]
artifact_store LocalArtifactStore(type=artifact_store, flavor=local, path=)
Failed to load model: File /mnt/models/model_metadata.yaml is outside of artifact store bounds /app/.zenconfig/local_stores/cf43b052-f85b-45ef-8c9a-325c24afc26c

Code of Conduct

safoinme commented 4 months ago

Hey @Nikitala0014 can we get the full trace of the error please

Nikitala0014 commented 4 months ago

Hey @Nikitala0014 can we get the full trace of the error please

Hey @safoinme Yes, sure. Updated above

htahir1 commented 1 month ago

@Nikitala0014 this is fixed by https://github.com/zenml-io/zenml/pull/2928 via @bcdurak Sorry for the late response

Nikitala0014 commented 1 month ago

@htahir1 Thank you, man! I appreciate that!)