ucbrise / clipper

A low-latency prediction-serving system
http://clipper.ai
Apache License 2.0
1.4k stars 280 forks source link

Prediction with dataframe as input #575

Open udaynaik opened 6 years ago

udaynaik commented 6 years ago

Hi, I am trying to understand or looking for code snippets to understand how I can handle multi-column input data with different data types in my predict function that is deployed using:

python_deployer.deploy_python_closure(self.cl, name=modelName, version=version, input_type=inputType, func=func)

where func is either the following or model.predict.

def predict_func(inp): preds = model.predict(inp) return [str(p) for p in preds]

My model in the same py file is: model = linear_model.LogisticRegression()

Example of Dataframe I would like to submit for prediction is: LIMIT_BAL SEX EDUCATION MARRIAGE AGE 50000 2 1 2 24 220000 1 1 2 34

Should I pass input_type as "bytes" and decode in the predict function before passing on to actual model predict function?

When I tried passing "model.predict", I get the following error: TypeError: Object of type 'method' is not JSON serializable

I am using sklearn linear_model.

Thank you!

withsmilo commented 6 years ago

@udaynaik : How about use a string in format of CSV as like "value1,value2,value3"? And then in your prediction function, you could parse and convert it to an object of DataFrame.

udaynaik commented 6 years ago

@withsmilo i tried using json string but the container hangs at 18-09-19:12:08:06 INFO [clipper_admin.py:458] Pushing model Docker image to loan-model:1 18-09-19:12:08:08 INFO [docker_container_manager.py:257] Found 0 replicas for loan-model:1. Adding 1

Here is my function which works if simply return the "inp":

def test_func(inp):
    #return inp  # works
    df = pd.read_json(inp, orient='columns')

    preds = lr_model.predict(df)

    return [str(p) for p in preds]

My inp is: '[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]'

sent via curl: curl -X POST --header "Content-Type:application/json" -d '{"input": "[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]"}' 127.0.0.1:1337/hello-world/predict

I am using the following to register:

x_train, x_test, y_train, y_test = train_test_split(df, target, test_size=5)

lr_model = linear_model.LogisticRegression()

lr_model.fit(x_train, y_train)

cl = ClipperConnection(DockerContainerManager())
cl.register_application(name="example", input_type="strings", default_output="slow", slo_micros=100000)
  python_deployer.deploy_python_closure(cl, name="loan", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas","sklearn","simplejson"])
cl.link_model_to_app(app_name="example", model_name="loan")
withsmilo commented 6 years ago

@udaynaik : This sample code is working for me. :)

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers import python as python_deployer

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.start_clipper()

import pandas as pd
def test_func(inp):
    # inp is a list of string
    def pred(i):
        df = pd.read_json(i, orient='columns')
        # return simple value
        return df['LIMIT_BAL'].tolist()[0]
    return [str(pred(i)) for i in inp]

clipper_conn.register_application(name="udaynaik-test", input_type="strings", default_output="default", slo_micros=100000)
python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas"])
clipper_conn.link_model_to_app(app_name="udaynaik-test", model_name="udaynaik-model")

import requests, json
headers = {"Content-type": "application/json"}
input_data = "[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]"
requests.post("http://localhost:1337/udaynaik-test/predict", headers=headers, data=json.dumps({"input": input_data})).json()
udaynaik commented 6 years ago

Thanks ..looks hopeful but interestingly @withsmilo I cut/paste same code and I get this error. I am using clipper-admin==0.3.0. Docker engine on Mac: 18.06.1-ce on Mac OS 10.12.6...

18-09-19:23:21:34 INFO     [docker_container_manager.py:257] Found 0 replicas for udaynaik-model:1. Adding 1
Traceback (most recent call last):
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x109c94fd0>: Failed to establish a new connection: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /-/reload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x109c94fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "working.py", line 18, in <module>
    python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas"])
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/deployers/python.py", line 222, in deploy_python_closure
    registry, num_replicas, batch_size, pkgs_to_install)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/clipper_admin.py", line 338, in build_and_deploy_model
    num_replicas, batch_size)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/clipper_admin.py", line 544, in deploy_model
    num_replicas=num_replicas)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 192, in deploy_model
    self.set_num_replicas(name, version, input_type, image, num_replicas)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 262, in set_num_replicas
    image)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 242, in _add_replica
    CLIPPER_INTERNAL_METRIC_PORT)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_metric_utils.py", line 156, in add_to_metric_config
    requests.post('http://localhost:9090/-/reload')
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /-/reload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x109c94fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))
withsmilo commented 6 years ago

@udaynaik : Port 9090 is for Promethus. I think that you need to cleanup your environment. Please retry it after removing all the containers by $ docker rm -f $(docker ps -a -q).

udaynaik commented 6 years ago

@withsmilo thanks! after docker restart/cleanup this worked!! But when I add "preds = lr_model.predict(df)" in the function returning prediction for each row of data coming in, it does not work (gets stuck at 18-09-20:01:02:01 INFO [clipper_admin.py:458] Pushing model Docker image to loan:1 18-09-20:01:02:03 INFO [docker_container_manager.py:257] Found 0 replicas for loan:1. Adding 1 ) My lr_model is: lr_model = linear_model.LogisticRegression()

also, since input is list of json objects (batch of 2 in our case), i should be able to construct 'df' and call predict without having an inner function..?

Here is full code: data: credit-default.csv.zip

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers import python as python_deployer
from sklearn.cross_validation import train_test_split
from sklearn import linear_model

from sklearn.cross_validation import train_test_split
from sklearn import linear_model
import pandas as pd

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.stop_all()
clipper_conn.start_clipper()

df = pd.read_csv('credit-default.csv', skiprows=[0])

target = df['default payment next month']
df = df[["LIMIT_BAL", "SEX", "EDUCATION", "MARRIAGE", "AGE"]]

x_train, x_test, y_train, y_test = train_test_split(df, target, test_size=5)

lr_model = linear_model.LogisticRegression()
lr_model.fit(x_train, y_train)

def test_func(inp):
    # inp is a list of string
    def pred(i):
        df1 = pd.read_json(i, orient='columns')
        # return simple value
        s = lr_model.predict(df1)
        return s
    return [str(pred(i)) for i in inp]

clipper_conn.register_application(name="udaynaik-test", input_type="strings", default_output="default", slo_micros=100000)
python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas","sklearn"])
clipper_conn.link_model_to_app(app_name="udaynaik-test", model_name="udaynaik-model")
YogeshSomawar commented 6 years ago

Hi @udaynaik ,

It may happen because of, required modules not installed and container failed to start. You can see the failed container by , docker ps -a & get the logs of the container like following,

$ docker logs <container_id>
Starting Python Closure container
Connecting to Clipper with default port: 7000
Encountered an ImportError when running container. You can use the pkgs_to_install argument when calling clipper_admin.build_model() to supply any needed Python packages.

As here, you are using sklearn, you need to install scipy module.

just update your python deployer line by,

python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas","sklearn","scipy"])

And regading 2 input to the API call, you can use input_batch instade of input

Hope this will solve your issue.

zoux86 commented 6 years ago

@udaynaik : This sample code is working for me. :)

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers import python as python_deployer

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.start_clipper()

import pandas as pd
def test_func(inp):
    # inp is a list of string
    def pred(i):
        df = pd.read_json(i, orient='columns')
        # return simple value
        return df['LIMIT_BAL'].tolist()[0]
    return [str(pred(i)) for i in inp]

clipper_conn.register_application(name="udaynaik-test", input_type="strings", default_output="default", slo_micros=100000)
python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas"])
clipper_conn.link_model_to_app(app_name="udaynaik-test", model_name="udaynaik-model")

import requests, json
headers = {"Content-type": "application/json"}
input_data = "[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]"
requests.post("http://localhost:1337/udaynaik-test/predict", headers=headers, data=json.dumps({"input": input_data})).json()

@withsmilo Hi i run your example, and it works . but i was confused why the anwers is image rather than {'query_id': 75, 'output': [2000,150000], 'default': False}

withsmilo commented 6 years ago

@zoux86 : I sent just one prediction request to the Clipper, and then Clipper returned first 'LIMIT_BAL' value by return df['LIMIT_BAL'].tolist()[0]. So your result is right.

withsmilo commented 6 years ago

@zoux86 : How about this?

def test_func(inp):
    def pre(i):
        d = eval(i)  # d's type is list[dict].
        for z in d:
            for k, v in z.items():
                z[k] = z[k] + 1
        return d
    return [str(pre(i)) for i in inp]
zoux86 commented 6 years ago

@withsmilo it works, thanks!!!