Open kotoroshinoto opened 9 months ago
Here is the error stack trace: https://pastebin.com/3ch44t75
when it runs the command @ skorch/net.py on line: 1182: dataset_train, dataset_valid = sknet.get_split_datasets(X_train, y_train_ints)
this is where the data turns into a mapping.
then later on it merges this and feeds it as kwargs to the forward function. starting with the conditional on line 1518
1518 if isinstance(x, Mapping):
1519 x_dict = self._merge_x_and_fit_params(x, fit_params)
1520 return self.module_(**x_dict)
1521 return self.module_(x, **fit_params)
Since X is a mapping due to the transformation earlier, it merges and fits and uses the module **dict version instead of the version that forwards x
Using to_numpy on the pandas objects and a different scoring method name doesn't trigger this, which is very strange.
Indeed, when you pass a pandas DataFrame
as input to skorch, it will convert it to a dict, with each column corresponding to one value in the dict. This is because PyTorch cannot deal with DataFrame
s, so we need to convert them to something more suitable.
Using to_numpy on the pandas objects and a different scoring method name doesn't trigger this, which is very strange.
When you pass a numpy array instead of a df, we don't encounter the aforementioned problem, which is why it works. Note, however, that this may not be what you want. For instance, if the df contains categorical data, you surely don't want to treat it like just numerical data.
We have a helper class that takes care of some of this: DataFrameTransformer
. Maybe this is something that would suite your needs. Otherwise, there is no easy solution to your issue: You need to do some feature engineering/transformation/scaling to make the data suitable for use with a neural net, then package the data either as a numpy array (if it's homogeneous) or as a dict of arrays/tensors.
Working with the data in this link: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
I am able to run this:
When I try to run this I get an error:
TypeError: MyNeuralNetwork.forward() got an unexpected keyword argument 'radius_mean'
It should be forwarding this in as x, not using each column by name. (these are pandas dataframes or series, for X and Y)