Closed LinXin04 closed 1 year ago
from pytorch_widedeep.preprocessing import WidePreprocessor wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols) wd_X_wide_tr = wide_preprocessor.fit_transform(train) wd_X_wide_te = wide_preprocessor.transform(test) wd_X_wide_va = wide_preprocessor.transform(val)
Hey @LinXin04 sure, let me answer, if you read the docs, you will read: "_input_dim (int) – size of the Embedding layer. input_dim is the summation of all the individual values for all the features that go through the wide model. For example, if the wide model receives 2 features with 5 individual values each, inputdim = 10"
so for example:
>>> import pandas as pd
>>> from pytorch_widedeep.models import Wide
>>> df = pd.DataFrame({"col1": ["a", "b", "c"], "col2": ["red", "blue", "yellow"]})
>>> df
col1 col2
0 a red
1 b blue
2 c yellow
>>> model = Wide(input_dim=6, pred_dim=1)
>>> model
Wide(
(wide_linear): Embedding(7, 1, padding_idx=0)
)
Note that the dimension of the Embedding layer is 7. This is because 0 is reserved for 'unseen' classes
Hope this helps
Thanks. About another question, If I only want to predict one sample, not a batch, how can i do? @jrzaurin
hi @LinXin04 in the Trainer
predict()
method you can define batchsize to just a single sample, see docs here
batch_size (int, default: 256 ) –
If a trainer is used to predict after having trained a model, the batch_size needs to be defined as it will not be defined as the Trainer is instantiated
if you are NOT using Trainer
from the library then you may just pass the sample as model(sample), described also here(and in short discussion below)
Hello, @jrzaurin @5uperpalo
Thanks for this discussion. They are very informative and helpful.
But I am kind of confused by the encoding step in the Wide model. Is there any alternatives in which we can use the continuous columns as input instead of using the onehot? Assigning 0 to the unseen values is kind of strange for the continuous columns. Or is there any availble resource for this implement?
Anyways, this work is totally perfect. Looking forward to your reply.
Thanks, Guowei
Excuse me, how to set up Wide's input dim? The size of Wide's input dim
from pytorch_widedeep.models import Wide, TabMlp, WideDeep wide = Wide(input_dim=np.unique(wd_X_wide_tr).shape[0], pred_dim=num_class)