snowflakedb / snowflake-ml-python

Apache License 2.0
43 stars 12 forks source link

Register `custom_model` with custom python package `pygam` #130

Open benleit opened 6 days ago

benleit commented 6 days ago

The goal is to use Snowflake’s Model Registry to store and deploy the pygam model, leveraging Snowflake's Snowpark and ML features. However, issues arise when registering the model due to compatibility constraints with the external package pygam.

Here is a minimal example that I tried to run in a Snowflake Notebook (I was able to use pygam via stage packages within Snowflake notebook):

# Import necessary Snowpark and Snowflake ML modules
from snowflake.snowpark.context import get_active_session
session = get_active_session()

import numpy as np
import pandas as pd
from pygam import LinearGAM, s
from snowflake.ml.model import custom_model
from snowflake.ml.model import model_signature
from snowflake.ml.registry import Registry

# Step 1: Generate synthetic data for model training
np.random.seed(0)
X = np.linspace(0, 10, 100).reshape(-1, 1)  # 100 samples, single feature
y = np.sin(X).ravel() + np.random.normal(scale=0.1, size=X.shape[0])  # Noisy sine wave data

# Step 2: Train a simple `pygam` model with smoothing on the synthetic data
pygam_model = LinearGAM(s(0)).fit(X, y)

# Step 3: Test the model by making predictions on new data
X_test = np.linspace(0, 10, 10).reshape(-1, 1)  # New test data
predictions = pygam_model.predict(X_test)
print("Predictions on new data:", predictions)  # Expected output for verification

# Step 4: Define a Custom Model class to wrap `pygam` in Snowflake
class PyGAMModel(custom_model.CustomModel):
    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = self.context["models"].predict(X)
        return model_output

# Step 5: Create the Model Context and pass in the trained `pygam` model
mc = custom_model.ModelContext(
    models=pygam_model
)

# Instantiate the Custom Model
pygam_model = PyGAMModel(mc)

# Test prediction in the custom model context
output_pd = pygam_model.predict(X_test)
print("Custom Model Prediction Output:", output_pd)

# Step 6: Register the Model in the Snowflake Model Registry
registry = Registry(
    session=session, 
    database_name="YOUR_DATABASE",  # Replace with your database
    schema_name="YOUR_SCHEMA",      # Replace with your schema
)

# Attempt to log the model
registry.log_model(
    model=pygam_model,
    model_name="pygam_model",
    version_name="v1",
    sample_input_data=X_test,
    comment="Test deployment of a pygam model as a Custom Model"
)

This is the error message I get:

AssertionError
Traceback:
File "Cell [cell25]", line 52, in <module>
    registry.log_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/_internal/telemetry.py", line 527, in wrap
    return ctx.run(execute_func_with_statement_params)
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/_internal/telemetry.py", line 503, in execute_func_with_statement_params
    result = func(*args, **kwargs)
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/registry/registry.py", line 288, in log_model
    return self._model_manager.log_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/registry/_manager/model_manager.py", line 82, in log_model
    return self._log_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/registry/_manager/model_manager.py", line 164, in _log_model
    model_metadata: model_meta.ModelMetadata = mc.save(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/model/_model_composer/model_composer.py", line 111, in save
    model_metadata: model_meta.ModelMetadata = self.packager.save(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/model/_packager/model_packager.py", line 87, in save
    handler.save_model(
File "/usr/lib/python_udf/27681972c608bd613d63a62e452b11ddd9e3e37327ceaca2b20f9d23e3491499/lib/python3.9/site-packages/snowflake/ml/model/_packager/model_handlers/custom.py", line 101, in save_model
    assert handler is not None

I tried conda_dependencies, pip_requirements, ext_modules & code_paths without success. Support for arbitrary pip package installation (beyond the Snowflake Anaconda Channel) in the Snowflake model registry would significantly improve the flexibility of deploying custom models with niche or specialized packages, like pygam. Is there any solution to this problem that is currently available? This is a big show stopper for us in migrating to use Snowflake's ML features.

sfc-gh-pramachandran commented 2 days ago

@benleit, thank you for bringing this issue to our attention. The current design of CustomModel requires the use of a model type that is supported by Snowflake.

This is due to our utilization of the model packager, which serializes the model (code and parameters) handler directory. This enables us to deploy models to warehouses or other container runtimes.

Regrettably, pygam is not currently supported. We will add this to our roadmap and prioritize it accordingly. We will notify you once support for pygam is available.

Thank you again for reporting this issue, and we apologize for any inconvenience this may have caused.