kubeflow / fairing

Python SDK for building, training, and deploying ML models
Apache License 2.0
337 stars 145 forks source link

predict() got an unexpected keyword argument 'meta' in seldon in Serving deployed via AppendBuilder directly #408

Open kierenj opened 4 years ago

kierenj commented 4 years ago

/kind bug

What steps did you take and what happened: Relevant code:

# fairing:include-cell
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
from urllib.request import urlopen
import numpy as np
import importlib
import os

class HousingServe(object):

    def __init__(self):
        self.model_file = "model.dat"
        self.model = None

    def train(self):
        # n/a - trained in the notebook, not deployed
        pass

    def predict(self, X, feature_names=None):
        if not self.model:
            self.model = joblib.load(self.model_file)

        prediction = self.model.predict(data=X)

        return prediction

# ---- different cell ----

def deploy_model(model):

    model.save('model.dat')

    from kubeflow import fairing
    from kubeflow.fairing import PredictionEndpoint

    BackendClass = getattr(
        importlib.import_module('kubeflow.fairing.backends'),
        'KubeflowGKEBackend')

    GCP_PROJECT = fairing.cloud.gcp.guess_project_name()
    DOCKER_REGISTRY = 'eu.gcr.io/{}'.format(GCP_PROJECT)
    BuildContext = None

    from kubeflow.fairing.deployers import serving
    from kubeflow.fairing.preprocessors.converted_notebook import ConvertNotebookPreprocessorWithFire
    from kubeflow.fairing.builders import cluster
    from kubeflow.fairing.builders import append

    preprocessor = ConvertNotebookPreprocessorWithFire(
        class_name='HousingServe',
        notebook_file='this_notebook.ipynb')

    input_files=['model.dat','requirements.txt']
    preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
    preprocessor.preprocess()

    base_image = "gcr.io/kubeflow-images-public/tensorflow-1.13.1-notebook-cpu:v0.5.0"
    cluster_builder = cluster.cluster.ClusterBuilder(registry=DOCKER_REGISTRY,
                                                     base_image=base_image,
                                                     preprocessor=preprocessor,
                                                     pod_spec_mutators=[fairing.cloud.gcp.add_gcp_credentials_if_exists],
                                                     context_source=cluster.gcs_context.GCSContextSource())
    cluster_builder.build()
    print(cluster_builder.image_tag)

    builder = append.append.AppendBuilder(
        registry=DOCKER_REGISTRY,
        base_image=cluster_builder.image_tag,
        preprocessor=preprocessor)
    builder.build()

    pod_spec = builder.generate_pod_spec()

    module_name = os.path.splitext(preprocessor.executable.name)[0]
    deployer = serving.serving.Serving(module_name + ".HousingServe",
                                       service_type="LoadBalancer",
                                       labels={"app": "mockup"})
    url = deployer.deploy(pod_spec)

requirements.txt:

fire
gitpython
google-cloud-storage
joblib
kubeflow-metadata
numpy
pandas
retrying
seldon-core
sklearn
xgboost
tornado>=6.0.3

Then, I POST as a test:

{
    "data":{
        "names":null,
        "tensor":{
            "shape":[2,2,2],
            "values":[0,0,0,0,0,0,0,0]
        }
    }
}

I get a 500 error, and the logs in stackdriver:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/seldon_core/user_model.py", line 156, in client_predict
    return user_model.predict(features, feature_names, **kwargs)
TypeError: predict() got an unexpected keyword argument 'meta'

What did you expect to happen: A result to be returned.

Anything else you would like to add: I'm sorry, I'm really struggling. I'm just trying to get a simple example of Fairing together and running on GCP, but I'm not even sure which APIs are known to be working. I'd be grateful if anyone could point me in the right direction. I'm sorry if I'm missing something obvious.

Environment:

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label kind/bug to this issue, with a confidence of 0.99. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

kierenj commented 4 years ago

I've skipped on Fairing and searched for other examples, and found build_python_component in the simple notebook example. I randomly googled and found some docs for it, which say it's deprecated, and to use build_image_from_working_dir and func_to_container_op.

The docs had me confused for a while (the default image is different than the one stated), but also the dependencies aren't captured for me. I've used modules_to_capture to specify the name of a notebook imported via import_ipynb but still just the single function makes it into the workflow and the module is referenced. Same with/without code pickling.

Building I custom image, on the func_to_container_op I get a Kaniko job, the default timeout is 1000 seconds and I'm at over an hour and a half sitting at the log line, Taking snapshot of full filesystem.... Is that correct? There are only a few dependencies..

Is there a recommended/documented way to wrap up a function and deps?

jtfogarty commented 4 years ago

/area engprod /priority p2