tensorflow / skflow

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
Apache License 2.0
3.18k stars 439 forks source link

Feedback on tutorial 3 - titanic embarked embedding #90

Closed mikowals closed 8 years ago

mikowals commented 8 years ago

Thanks for the tutorial. It was very easy to follow. Like many might do, I tested my new knowledge by playing with the other categorical variables but this led to a difficult to understand stack trace.

To reproduce, just replace "Embarked" with "Pclass" in the lines where X and `embarked_classes" get assigned. When trying to fit the model this will throw index out of range errors.

I eventually fixed it by increasing n_classes to unique classes plus 1. I think the error stems from the embedding wanting a row for nan (the Embarked column has missing values but Pclass does not) but I could not get my head around the code to confirm the actual source.

I am only mentioning this as it is a beginners tutorial and the most basic fix is a comment in the code to explain that n_classes needs to account for nan (assuming I am understanding the error correctly). I also wonder if n_classes should just get handled inside categorical_variable().

ilblackdragon commented 8 years ago

You are right, nan is considered as another class. The correct way to do it would be actual to use cat_processor to get the full count - len(cat_processor.vocabularies_[0]) in the vocabulary.

I'll correct the post, thanks for commenting here about this!