skops-dev / skops

skops is a Python library helping you share your scikit-learn based models and put them in production
https://skops.readthedocs.io/en/stable/
MIT License
446 stars 53 forks source link

`TypeError` when loading model using methodcaller #283

Closed merveenoyan closed 1 year ago

merveenoyan commented 1 year ago

Came across this error for only this model generated using the test case generation script in api-inference-community: *** TypeError: methodcaller needs at least one argument, the method name

To reproduce:

from skops import hub_utils
hub_utils.download(repo_id = "skops-tests/textclassification-sklearn-latest-hist_gradient_boosting-with-config-skops", dst="model_dir")

import skops.io as sio
sio.load("/content/model_dir/skops-v588xnq6.skops", trusted=True)

At first I thought something might've been wrong with how I serialize it but I think it's weird how every test passes and this model fails for inference.

BenjaminBossan commented 1 year ago

@E-Aho @adrinjalali This is a veritable bug, here is a minimal reproduction:

from sklearn.base import BaseEstimator
from operator import methodcaller

class MyEstimator(BaseEstimator):
   def __init__(self):
       pass
   def fit(self, X, y):
       self.meth_ = methodcaller("foo")
       return self

est = MyEstimator().fit(None, None)
sio.loads(sio.dumps(est), trusted=True)

Resulting in:

--> 397 instance = cls.__new__(cls)  # type: ignore
    399 if not self.children["attrs"]:
    400     # nothing more to do
    401     return instance

TypeError: methodcaller needs at least one argument, the method name

The offending line:

https://github.com/skops-dev/skops/blob/9bf557e190230c4ad5bc70f650437c1c30054d51/skops/io/_general.py#L397

This could be an interesting one to resolve. Presumably, methodcaller is not the only type that fails like this, we should ensure that other similar types work too.

@merveenoyan methodcaller is used here:

https://github.com/huggingface/api-inference-community/blob/763efb856835098e7c9b91219b241ea7e524e695/docker_images/sklearn/tests/generators/generate.py#L96

The script could be rewritten to not use methodcaller and should then pass.

adrinjalali commented 1 year ago

I was confused for a while, since methodcaller is defined here (https://github.com/python/cpython/blob/498598e8c2d64232d26c075de87c513415176bbf/Lib/operator.py#L302) as:

class methodcaller:
    """
    Return a callable object that calls the given method on its operand.
    After f = methodcaller('name'), the call f(r) returns r.name().
    After g = methodcaller('name', 'date', foo=1), the call g(r) returns
    r.name('date', foo=1).
    """
    __slots__ = ('_name', '_args', '_kwargs')

    def __init__(self, name, /, *args, **kwargs):
        self._name = name
        if not isinstance(self._name, str):
            raise TypeError('method name must be a string')
        self._args = args
        self._kwargs = kwargs

    def __call__(self, obj):
        return getattr(obj, self._name)(*self._args, **self._kwargs)

    def __repr__(self):
        args = [repr(self._name)]
        args.extend(map(repr, self._args))
        args.extend('%s=%r' % (k, v) for k, v in self._kwargs.items())
        return '%s.%s(%s)' % (self.__class__.__module__,
                              self.__class__.__name__,
                              ', '.join(args))

    def __reduce__(self):
        if not self._kwargs:
            return self.__class__, (self._name,) + self._args
        else:
            from functools import partial
            return partial(self.__class__, self._name, **self._kwargs), self._args

which doesn't implement __new__. So it was puzzling why the error. Then I realized, like many other things in python, there's also a C implementation of it, and it implements __new__ here (https://github.com/python/cpython/blob/498598e8c2d64232d26c075de87c513415176bbf/Modules/_operator.c#L1563), which starts as:

static PyObject *
methodcaller_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    methodcallerobject *mc;
    PyObject *name;

    if (PyTuple_GET_SIZE(args) < 1) {
        PyErr_SetString(PyExc_TypeError, "methodcaller needs at least "
                        "one argument, the method name");
        return NULL;
    }

Anyhow, now we know it's happening because methodcaller has a custom __new__ method which requires an extra arg. Luckily not many objects implement __new__, and I always thought for these objects we'd have a dedicated loader, like we do for set, dict etc. Cause I really don't want to rely on their __reduce__ lol

As for the issue on the hub, @merveenoyan you can change the code to avoid using methodcaller, while we fix it here for this object.