tableau / TabPy

Execute Python code on the fly and display results in Tableau visualizations:
https://tableau.github.io/TabPy/
MIT License
1.56k stars 598 forks source link

Tabpy using old model after redeployment. #265

Closed Odin2 closed 4 years ago

Odin2 commented 5 years ago

Problem:

I'm training a simple scikit LogisticRegression model, and publishing a function that takes in multiple arguments (integers, strings) and returns a prediction.

The problem is after tuning the parameters, re-training the model, and re-deploying the method, tabpy is still using the previous model. Example:

first model, both function and deployed method return the same prediction: image

improved model, deployed method still returning old value: image

Is there something I'm missing about Tabpy/pickle/models?

example of re-deployment after model is trained: connection = tabpy_client.Client('http://localhost:9004/') connection.deploy('predictInputAxisX', predictInputAxisX, 'XXXX',override=True)

nmannheimer commented 5 years ago

Is it possible the code you're running is still referring to the older function (given that it looks like they have the same name)? The most sure way to do this would be to run the command to remove the endpoint in the documentation detailed below, and then apply your newly trained model as a fresh endpoint. https://github.com/tableau/TabPy/blob/master/docs/tabpy-tools.md

Odin2 commented 5 years ago

Yes, sorry I forgot to mention I've tried client.remove on the endpoint, and re-deploying, and the problem persists

0golovatyi commented 5 years ago

With new TabPy we introduced tabpy-tools. If you are running the latest TabPy release (https://github.com/tableau/TabPy/releases/tag/0.4) use tabpy-tools instead of tabpy-client as described at https://github.com/tableau/TabPy/blob/master/docs/tabpy-tools.md

Odin2 commented 5 years ago

Hi, same issue occurs using tabpy-tools instead of tabpy-client

nmannheimer commented 5 years ago

Hi @Odin2 , what happens if you deploy the function with a different name?

Odin2 commented 5 years ago

Still returns the first model's prediction.

Only thing that works to "refresh" the prediction is to completely delete the tabpy folder and download a fresh copy

0golovatyi commented 5 years ago

@Odin2 You deployed a model with a different name but when calling it another model with some other name is called? I would guess your Tableau is pointed to a wrong TabPy server.

Odin2 commented 5 years ago

Do you mean the name of the tabpy endpoint? or the python function it calls?

for example/clarification:

The scikit logistic regression object is called model

the python function is called predFunc which does something like:

return model.predict_proba([input parameters])

the endpoint I named predictInputAxisX deployed like:

connection.deploy('predictInputAxisX', predFunc, 'XXXX',override=True)

as per nmannheimer's comment I deployed a second endpoint named predictInputAxisX2 with the same function called which gives the same result as predictInputAxisX

I'm sure Tableau is pointed to the right server as when I go through the trouble of deleting the tabpy folder and replacing it, it correctly saves the model the first time it's run, but only the first.

0golovatyi commented 5 years ago

I am not sure I am reading it correctly, just trying to clarify.

So you have a model, e.g.

def my_model_1(x, y):
    return x + y

and then you deploy it

client.deploy('my_model`, my_model)

You can call it at this moment and can see expected results.

Now you are trying to deploy different Python function as the same model name:

def my_model_2(x, y)
    return x * y

client.deploy('my_model', my_model_2, override=True)

And now when you call the model with

tabpy.query('my_model', [1], [2])

You still see my_model_1 to be executed instead of my_model_2?

Odin2 commented 5 years ago

By model I mean a scikit Logistic Regression classifier. I train the model outside the function, so as to avoid retraining the model every time the method is called. (Maybe this approach is what's causing it?)

So the structure is like this:

X_train, X_test, y_train, y_test = train_test_split(X,y,train_size=0.3, test_size=0.7)

model = LogisticRegression(solver='liblinear', penalty='l1')

model.fit(X_train, y_train)

def predFunc([Input Parameters]):
        #prepares data for prediction
        return model.predict_proba([InputData])[0][1]

connection.deploy('predictInputAxisX', predFunc, 'XXXX',override=True)

The idea is that I could go back and tweak the parameters in this line (or change the algorithm altogether) and have updated predictions in my tableau dashboard model = LogisticRegression(solver='liblinear', penalty='l1')

nmannheimer commented 5 years ago

@Odin2 would it be possible for you to send your code and data to nmannheimer@tableau.com so we can take a closer look?

Odin2 commented 5 years ago

By data you mean the data the model is being trained on?

nmannheimer commented 5 years ago

@Odin2 yes, or at least a mocked up sample. I want to see if I can reproduce the scenario you're seeing on our end.

0golovatyi commented 5 years ago

@Odin2 do you still have this issue or it was resolved?