ibis-project / ibis-ml

IbisML is a library for building scalable ML pipelines using Ibis.
https://ibis-project.github.io/ibis-ml/
Apache License 2.0
96 stars 13 forks source link

bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline #149

Open jitingxu1 opened 2 months ago

jitingxu1 commented 2 months ago

In this competition, y column cannot be converted to numpy array.

I could run this on my local machine, but not on kaggle notebook.

~~I could reproduce this on my local.~~

local env

Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ]
scikit-learn version: 1.5.1
skorch version: 1.0.0
torch version: 2.4.0
ibis-framework version: 9.3.0

kaggle env

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
scikit-learn version: 1.2.2
skorch version: 1.0.0
torch version: 2.4.0+cpu
ibis-framework version: 9.3.0
# Wrap the PyTorch model with skorch
net = NeuralNetClassifier(
    MyModel,
    module__input_dim=635,  # Specify the input dimension
    max_epochs=1,
    lr=0.001,
    batch_size=32,
    optimizer=optim.Adam,
    criterion=nn.BCELoss,
    iterator_train__shuffle=True,
    callbacks=[
        EarlyStopping(monitor='valid_loss', patience=25, load_best=True),  # Early stopping
        LRScheduler(policy='ReduceLROnPlateau', monitor='valid_loss', factor=0.1, patience=25, min_lr=1e-6)
    ],
    verbose=1
)

# Define the sklearn pipeline with preprocessing and PyTorch model
pipeline = Pipeline([
    ('ibisml-prep', recipe),  # Preprocessing step in IbisML
    ('model', net)  # The PyTorch model wrapped as NeuralNetClassifier via skorch
])

pipeline.fit(X_train, y_train)

log

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 pipeline.fit(X_train, y_train)

File /opt/conda/lib/python3.10/site-packages/sklearn/pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params)
    403     if self._final_estimator != "passthrough":
    404         fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--> 405         self._final_estimator.fit(Xt, y, **fit_params_last_step)
    407 return self

File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:165, in NeuralNetClassifier.fit(self, X, y, **fit_params)
    154 """See ``NeuralNet.fit``.
    155 
    156 In contrast to ``NeuralNet.fit``, ``y`` is non-optional to
   (...)
    160 
    161 """
    162 # pylint: disable=useless-super-delegation
    163 # this is actually a pylint bug:
    164 # https://github.com/PyCQA/pylint/issues/1085
--> 165 return super(NeuralNetClassifier, self).fit(X, y, **fit_params)

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1319, in NeuralNet.fit(self, X, y, **fit_params)
   1316 if not self.warm_start or not self.initialized_:
   1317     self.initialize()
-> 1319 self.partial_fit(X, y, **fit_params)
   1320 return self

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1278, in NeuralNet.partial_fit(self, X, y, classes, **fit_params)
   1276 self.notify('on_train_begin', X=X, y=y)
   1277 try:
-> 1278     self.fit_loop(X, y, **fit_params)
   1279 except KeyboardInterrupt:
   1280     pass

File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1172, in NeuralNet.fit_loop(self, X, y, epochs, **fit_params)
   1136 def fit_loop(self, X, y=None, epochs=None, **fit_params):
   1137     """The proper fit loop.
   1138 
   1139     Contains the logic of what actually happens during the fit
   (...)
   1170 
   1171     """
-> 1172     self.check_data(X, y)
   1173     self.check_training_readiness()
   1174     epochs = epochs if epochs is not None else self.max_epochs

File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:141, in NeuralNetClassifier.check_data(self, X, y)
    137         pass
    139 if y is not None:
    140     # pylint: disable=attribute-defined-outside-init
--> 141     self.classes_inferred_ = np.unique(to_numpy(y))

File /opt/conda/lib/python3.10/site-packages/skorch/utils.py:152, in to_numpy(X)
    149     return np.asarray(X)
    151 if not is_torch_data_type(X):
--> 152     raise TypeError("Cannot convert this data type to a numpy array.")