Closed michelole closed 2 years ago
skift seems to expect integer labels and will fail when using string labels.
skift
For instance, when running
from skift import FirstColFtClassifier import pandas as pd df = pd.DataFrame( data=[ ['woof', 'a'], ['meow', 'b'], ['squick', 'c'], ], columns=['txt', 'lbl'], ) sk_clf = FirstColFtClassifier(lr=0.3, epoch=10) sk_clf.fit(df[['txt']], df['lbl']) sk_clf.predict([['squick']])
I get
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-32-52a73258e761> in <module> ----> 1 sk_clf.predict([['squick']]) /usr/local/Caskroom/miniconda/base/envs/base/lib/python3.7/site-packages/skift/core.py in predict(self, X) 165 return np.array([ 166 self._clean_label(res[0][0]) --> 167 for res in self._predict(X) 168 ], dtype=np.float_) 169 /usr/local/Caskroom/miniconda/base/envs/base/lib/python3.7/site-packages/skift/core.py in <listcomp>(.0) 165 return np.array([ 166 self._clean_label(res[0][0]) --> 167 for res in self._predict(X) 168 ], dtype=np.float_) 169 /usr/local/Caskroom/miniconda/base/envs/base/lib/python3.7/site-packages/skift/core.py in _clean_label(ft_label) 135 @staticmethod 136 def _clean_label(ft_label): --> 137 return int(ft_label[9:]) 138 139 def _predict_on_str_arr(self, str_arr, k=1): ValueError: invalid literal for int() with base 10: 'c'
This is a bit unexpected since neither sklearn nor fasttext require integer labels.
sklearn
fasttext
I guess skift could handle that either by:
LabelEncoder
Hmmm. Good point!
I would accept a PR solving this either way. Would you consider writing one? :)
Sure, let me just find some spare cycles...
:)
skift
seems to expect integer labels and will fail when using string labels.For instance, when running
I get
This is a bit unexpected since neither
sklearn
norfasttext
require integer labels.I guess
skift
could handle that either by:fasttext
(caveat: might require some cleaning)LabelEncoder
(e.g. as insklearn
's code for LR)