Closed mikowals closed 8 years ago
You are right, nan
is considered as another class.
The correct way to do it would be actual to use cat_processor
to get the full count - len(cat_processor.vocabularies_[0])
in the vocabulary.
I'll correct the post, thanks for commenting here about this!
Thanks for the tutorial. It was very easy to follow. Like many might do, I tested my new knowledge by playing with the other categorical variables but this led to a difficult to understand stack trace.
To reproduce, just replace "Embarked" with "Pclass" in the lines where
X
and `embarked_classes" get assigned. When trying to fit the model this will throw index out of range errors.I eventually fixed it by increasing
n_classes
to unique classes plus 1. I think the error stems from the embedding wanting a row fornan
(the Embarked column has missing values but Pclass does not) but I could not get my head around the code to confirm the actual source.I am only mentioning this as it is a beginners tutorial and the most basic fix is a comment in the code to explain that
n_classes
needs to account fornan
(assuming I am understanding the error correctly). I also wonder ifn_classes
should just get handled insidecategorical_variable()
.