keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.92k stars 19.45k forks source link

Keras Docs Examples silently assume categorical tasks #1454

Closed pasky closed 8 years ago

pasky commented 8 years ago

My first few hours with Keras were needlessly a lot more painful than needed, because I didn't realize most of the examples at http://keras.io/examples/ and elsewhere were geared at categorical, not binary classification (which is what I personally assumed by default) - and when I realized, I didn't realize how important that was. My models weren't learning anything...

So, right at the top of http://keras.io/examples/, I'd propose to have two MLP variants, one for categorical classification and another for binary classification - sigmoid inst. of softmax activation for final layer and passing class_mode='binary' to .compile() (the former is tricky to realize for machine learning newbies and the latter is almost undiscoverable).

(For Google users benefit, I also documented my experience at http://log.or.cz/?p=386 .)

farizrahman4u commented 8 years ago

The examples cover all basic loss functions, mse, binary and categorical. Binary classification being the default is, as you said, your personal assumption. And the examples do not "silently" assume anything.. the loss function is explicitly provided to the model's compile function. Moreover, the Sequence classification with LSTM example is an example for binary classification. So your statement from the blog "All the examples silently assume that you want to classify to categories" is not true. Criticism is welcome, but get your facts straight. There are a lot of examples in the examples directory, which work out of the box. They download the data and train the models automatically..check them out to understand what your training data should look like.

pasky commented 8 years ago

Sorry if the blog post came off as an attack at Keras - I toned it down a bit more (and fixed that example mention), it wasn't meant so.

You are right that it's possible to figure it out with a bit of digging and looking at many examples - I totally agree with that.

I just wanted to point out that it's not obvious imho and I don't think it's unlikely it's confusing some other newbies too; I guess it also has to do with personal learning approach, some people will try out various existing examples to learn the framework, while others (like me) will approach the framework with a task they need to solve right away and play with the framework on that task. I think it's worth it to make Keras easily discoverable for people of both kinds, if it's not too much effort.

Just showing the simple MLP model at the top on both categorical and binary task would convey a lot of information and clarify this.

gammaguy commented 8 years ago

I agree with Pasky in that Keras is this great project that creates an abstraction layer to make models more accessible which according to the doc "It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."

But there no simple binary classification example, say logical AND. You could have the data and the code together, people could run it and get excited.

While it is great that PhD’s and ML specialists who are familiar with all the syntax, semantics and quirks of the all models should have no problem figuring it out of little relevance to the people who are just learning or and want to try some simple toy models. What ends up happening is they try for couple of hours, can’t get it to work and then move on; damning Keras to obscurity as another expert system.

From what I can glean from the docs, Keras has great potential to help bring complex ML structures to the mainstream but needs a couple better toy examples with some comments as to say why softmax doesn’t make sense for binary classification.

farizrahman4u commented 8 years ago

needs a couple better toy examples with some comments as to say why softmax doesn’t make sense for binary classification.

Not to be rude but don't expect Keras docs to teach you machine learning or basic arithmetic and algebra. If you do not understand softmax, sigmoid or tanh, definitely Keras docs will make less sense to you (and so will the docs of other libraries). Keras is not your Deeplearning 101.. you should learn the math from other sources, and then come here to get your hands dirty. You don't learn aeronautics in the cockpit. Read a couple of books, watch a couple of videos on deep learning in youtube and come back.. you will be surprised to see that the Keras docs makes absolute sense then. Also, we have past the era where the XOR problem was the "Hello World" of neural networks. So don't expect those. MINST is the new XOR. The docs of all deep learning libraries will agree with this (see tensorflow). You can also see it this way : The parameters of sgd in keras are tuned for real world problems, so adding an AND / XOR example would require passing extra parameters to sgd, and this would complicate the example.

Again, I mean no offense and I appreciate you guys sharing your issues. cheers!

gammaguy commented 8 years ago

Farizrahman4u, you are obviously a really smart guy who is passionate about ML and Keras.

What I don't understand is in the time and effort that it took you to respond to me and lightly flame me you could have done as I suggested.

I understand softmax, I used it as an example of something someone might use without first thinking. I've done NG's course. I've programmed my own NN with backprop and dropout from scratch. I don't want to be argumentative 'XOR' will always be the "Hello World" of neural networks. Just as "Hello World" is where all programming language starts. An XOR example is simple, easy to understand and gives someone who is starting with Keras the satisfaction that they got something they understand to work, whether or not there are ML experts.

I myself have been struggling with the implementation of the binary AND. Please find the code below that I can’t get to work. I think the not so expert community would benefit from your insight on how to debug it. You may even consider adding it to your examples.

One of the other things that I have been struggling with the bias nodes. While I see them sporadically mentioned it is unclear to me whether they are auto-included behind the scenes or if not how do I specify them.

If you are ever in Barbados, give me a shout and I’ll thank you with some beers or whatever your poison.

Thank you Paul

import theano import numpy as np import pandas as pd from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation from keras.optimizers import SGD

data = np.array([ [0, 0, 0], [0, 1, 0], [1, 0, 0], [1, 1, 1], ])

X_train = data[:, :-1] y_train = data[:, -1:]

X_test = data[:, :-1] y_test = data[:, -1:]

model = Sequential() model.add(Dense(5, input_dim=2, init='uniform',activation='sigmoid')) model.add(Dense(5, init='uniform',activation='sigmoid')) model.add(Dense(1, init='uniform',activation='sigmoid'))

sgd = SGD(lr=01, nesterov=False)

model.compile(loss='mean_squared_error', optimizer='sgd', class_mode='binary')

model.fit(X_train, y_train, nb_epoch=200, batch_size=1, verbose=1, show_accuracy=True) score = model.evaluate(X_test, y_test, batch_size=4) print("score = {:10.4}".format(score))

classes = model.predict_classes(X_test, batch_size=1) print(classes)

fchollet commented 8 years ago

But there no simple binary classification example, say logical AND.

There are several binary classification examples in the examples folder (all imdb_* scripts). I just added one in the docs as well: http://keras.io/examples/

I don't who gave you the idea that logical AND or XOR were good examples for neural networks, but they probably weren't active in the machine learning field past 1980.

pasky commented 8 years ago

Note that I personally think @gammaguy's comments, adding more very basic examples, do not necessarily entail or are entailed by fixing this bug (I guess I should PR soon...). My wish has been only a very minor tweak to the initial examples section, not necessarily building full new examples with new tasks.

I like the idea of a very simple example model (probably not binary AND) that would also include running prediction on unseen data, looking at internal layer predictions and checking the weights. But it's a bit different direction.

(And I also think that Keras doesn't need to tailor to complete ML beginners, sklearn is pretty great at that.)

@gammaguy, regarding your code, I think the main problem is that you do not have enough variety in the training set - actually, all your training examples have y=0! Try a more reasonable task than binary AND, which is actually very hard for this reason - for example, the Iris dataset is popular for introduction to various ML models.

gammaguy commented 8 years ago

Thank you @fchollet for adding the example and thank you for Keras. I have another comment for you a little lower on an error with your example.

Sorry @pasky if I put words in your mouth, I was just trying to support your point. As for your comment on all my training examples having y=0, the last one has y=1.

I was modelling an AND the inputs and output were:

0,0 -> 0 0,1 -> 0 1,0 -> 0 1,1 -> 1

I changed to model.compile(loss='binary_crossentropy', optimizer='rmsprop', class_mode='binary') using the sigmoid in my code and it converged, <.01 loss after 2627 epochs.

@fchollet I tried your full new example code but got the following error:

File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 463, in relu x = T.nnet.relu(x, alpha) AttributeError: 'module' object has no attribute 'relu'

where line 50 was class_mode='binary') from the model.compile line from your example.

The full traceback was:

Using gpu device 0: GeForce GTX TITAN X Using Theano backend. Traceback (most recent call last): File "/home/paul/.PyCharm50/config/scratches/scratch_2", line 50, in class_mode='binary') File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/models.py", line 408, in compile self.y_train = self.get_output(train=True) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/containers.py", line 128, in get_output return self.layers[-1].get_output(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 949, in get_output X = self.get_input(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input previous_output = self.previous.get_output(train=train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 624, in get_output X = self.get_input(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input previous_output = self.previous.get_output(train=train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 658, in get_output X = self.get_input(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input previous_output = self.previous.get_output(train=train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 949, in get_output X = self.get_input(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input previous_output = self.previous.get_output(train=train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 624, in get_output X = self.get_input(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input previous_output = self.previous.get_output(train=train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 658, in get_output X = self.get_input(train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 159, in get_input previous_output = self.previous.get_output(train=train) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 950, in get_output output = self.activation(K.dot(X, self.W) + self.b) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/activations.py", line 25, in relu return K.relu(x, alpha=alpha, max_value=max_value) File "/home/paul/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 463, in relu x = T.nnet.relu(x, alpha) AttributeError: 'module' object has no attribute 'relu'

pasky commented 8 years ago

@gammaguy Maybe you have old Theano? I believe very recent (git-based, not last release from August) Theano is required for Keras.

gammaguy commented 8 years ago

@pasky You were right, worked like a dream.

Thank you. If either you or @fchollet ever get to Barbados give a shout.