koaning / embetter

just a bunch of useful embeddings
https://koaning.github.io/embetter/
MIT License
469 stars 15 forks source link

Dedup model: might make for a nice util #80

Closed koaning closed 1 year ago

koaning commented 1 year ago

Something like this:

class MultiClassifier:
    def __init__(self, enc, mod=None, setting:str = "absdiff"):
        self.enc = enc
        self.setting = setting
        self.clf_head = LogisticRegression(class_weight="balanced") if not mod else mod

    def _calc_feats(self, X1, X2):
        if self.setting == "absdiff":
            return np.abs(self.enc(X1) - self.enc(X2))

    def fit(self, X1, X2, y):
        self.clf_head.fit(self._calc_feats(X1, X2))
        return self

    def partial_fit(self, X1, X2):
        self.clf_head.partial_fit(self._calc_feats(X1, X2))
        return self

    def predict(self, X1, X2):
        return self.clf_head.predict(self._calc_feats(X1, X2))

    def predict_proba(self, X1, X2):
        return self.clf_head.predict_proba(self._calc_feats(X1, X2))
koaning commented 1 year ago

Now part of the library.

https://koaning.github.io/embetter/applications/#difference-models