rstudio / vetiver-python

Version, share, deploy, and monitor models.
https://rstudio.github.io/vetiver-python/stable/
MIT License
60 stars 17 forks source link

API breaks when defining custom elements in model #215

Open brooklynbagel opened 1 month ago

brooklynbagel commented 1 month ago

Describe the bug

When defining a custom element to use in a model, the the API for serving up the model breaks. For example, using this simple no-op transformer results in the API breaking with a error message like AttributeError: Can't get attribute 'MyTransformer' on <module '__main__' from '...'> even when loaded into an API like https://github.com/brooklynbagel/vetiver-reprex-custom-elements/blob/3ec27d180be0c6ec115af1554fbd2b8f830fa73b/attempt-2/api/app.py.

class MyTransformer(TransformerMixin, BaseEstimator):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X

To Reproduce Steps to reproduce the behavior:

  1. Define a custom element to use a model
  2. Deploy said model
  3. Either deploy API for model or run locally with uvicorn app:api
  4. See error AttributeError: Can't get attribute 'MyTransformer' on <module '__main__' from '...'>

Expected behavior API should start up normally with no error

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context

See reprex

It does work when forcing the transformer into module __main__, see

class MyTransformer(TransformerMixin, BaseEstimator):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X

# this fixes the `AttributeError`
setattr(sys.modules["__main__"], "MyTransformer", MyTransformer)

API deployed on dogfood: https://connect.posit.it/content/aaac8d80-fb22-48d0-98df-bf1683f91170

isabelizimm commented 4 weeks ago

Thank you so much for this report! I believe the error you are running into is based off of how you are pinning the model in conjunction with how you are deploying the model.

pickle is very flexible (perhaps to a fault 😅) and rather than remembering the source code, it just remembers how to get to the location it was imported. In your scenario, when you are creating and pinning the model in the same file (say model.py), in the pickle, the path would be __main__.MyTransformer. When you are importing it into the app.py file to run the API itself, the loaded path for MyTransformer would be from model.MyTransformer, which pickle doesn't know what to do with. A fix would be something like, if you had a file model.py that built the model and a second deploy.py to load the model/pin/deploy it. That way, it is always known where to find the model. Let me know if that helps!

There's probably room for a better error message here, or maybe some docs on why this is important. I'm open to hearing what you think is important to help clarify this to others!

brooklynbagel commented 3 weeks ago

That makes sense w.r.t. pickle just remembering the location of the module. I think some better documentation and error reporting would be helpful at a minimum.

It would be nice if there were a nicer developer experience of having to define your model in a separate model.py than the .py, .ipynb or .qmd where you're working from. I'm wondering if it would be possible to 'trick' pickle (when deploying the model) into thinking custom modules are where the FastAPI app.py would expect them or if this creates even more problems.