Closed Practcdi closed 3 years ago
Hey @Practcdi,
this is due to the case, that fit/train
requires a list of strings instead of a DataFrame. (See function documentation here)
Fix: pass x_train.values.tolist(), y_train
to clf.train()
Following the respective code lines (here):
x_train, y_train = list(x_train), list(y_train)
if len(x_train) != len(y_train):
raise ValueError("`x_train` and `y_train` must have the same length")
If you pass a dataframe to the variable x_train
of shape = (535544, 1)
casting this to a list will only return the column names.
Thus the check will compare the following:
if 1 != 535544:
raise ValueError("`x_train` and `y_train` must have the same length")
Hey @Practcdi,
TLDR
this is due to the case, that
fit/train
requires a list of strings instead of a DataFrame. (See function documentation here)Fix: pass
x_train.values.tolist(), y_train
toclf.train()
Bit more insights on why it does not work:
Following the respective code lines (here):
x_train, y_train = list(x_train), list(y_train) if len(x_train) != len(y_train): raise ValueError("`x_train` and `y_train` must have the same length")
If you pass a dataframe to the variable
x_train
ofshape = (535544, 1)
casting this to a list will only return the column names. Thus the check will compare the following:if 1 != 535544: raise ValueError("`x_train` and `y_train` must have the same length")
Thanks lot 😊
@Practcdi Thanks for sharing this issue with us!
@angrymeir Thanks for taking care of it :muscle:, btw, what do you think of adding an extra check at the beginning of fit/train
throwing an ValueError
exception saying something like "the x_train
argument is expected to be a list of strings" when the provided x_train
isn't a list of string. :thinking:
@sergioburdisso Hm unsure about that one because...
fit/train
that needs this kind of validation or also other methods (potentially all methods with user input because of consistency)?x_train
can be casted to a list of strings without information loss. E.g. while pandas.DataFrame
can't be casted, pandas.Series
can be casted without issues, so it should stay a valid option?
Hey ,
[Note] : I have pandas dataframe contain 2 columns as ,
1) Text 2) Label
train () and fit()
methods are not workinghere is a reference code
How to fix it?
Thanks