lqj1990 commented 8 years ago

Most examples are use metrics=['accuracy'], but accuracy is not always suitable for every task.

So are there any metrics such as precision, recall and so on?
If there are, what should I write in metrics list in order to use them?
If I just have one output, can I use multiple metrics to evaluate it from different aspect?

RaffEdwardBAH commented 8 years ago

One of the doc pages says the accuracy is the only thing implemented right now. There really should be a tab for metrics that says that and can be expanded later.

sloth2012 commented 8 years ago

I found funcitons name which like 'mae' or 'mean_absolute_error' in keras.metrics can be used in metrics, just like the parameter loss. It seems like the metrics is just used for logging, not joined in the training work. By the way, the document really need to point that what the metrics support.

eli7 commented 7 years ago

Precision, Recall and F1-score were added by someone:

https://github.com/fchollet/keras/blob/master/keras/metrics.py

Example usage:

model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['binary_accuracy', 'fmeasure', 'precision', 'recall'])

gregmcinnes commented 7 years ago

After updating I still get this error: Exception: Invalid metric: precision

eli7 commented 7 years ago

Hey Greg,

As of now, the latest Keras package doesn't contain this yet.

You can download the metrics code from GitHub, then copy it over your current one:

wget https://raw.githubusercontent.com/fchollet/keras/master/keras/metrics.py
sudo cp metrics.py /usr/local/lib/python2.7/dist-packages/keras/

gregmcinnes commented 7 years ago

Thanks! That worked great

kevingo commented 7 years ago

I think the document is already updated? https://keras.io/metrics/

wqp89324 commented 7 years ago

What is the difference between loss (objectives) and metrics?

jhli973 commented 7 years ago

@wqp89324 A metric is a function that is used to judge the performance of your model. A metric function is similar to an loss function, except that the results from evaluating a metric are not used when training the model. You can find from this url: https://keras.io/metrics/

neverfox commented 7 years ago

@wqp89324 Another way to put it, expanding on @jhli973's answer, is that the evaluation metric is what you as the researcher will use to judge the model's performance (on training, test, and/or evaluation data); it's the bottom line number that you would publish. The loss function is what the network will use to try to improve itself, hopefully in a way that leads to improved evaluation for the researcher's sake. For example, in a binary classification problem, the network might train using a binary crossentropy loss function with gradient descent, whereas the modeler's goal is to design a network to improve binary category accuracy on hold-out data.

brannondorsey commented 7 years ago

It looks like many of the helpful metrics that used to be supported have been removed with Keras 2.0. I'm working on a classification problem where f-score would be much more valuable to me than accuracy. Is there a way that I can use that as a metric, or am I encouraged to use metrics.categorical_accuracy instead? If so, why? And how does that differ from metrics.sparse_categorical_accuracy. Cheers!

dattanchu commented 7 years ago

I resolved my problem by getting the old code from https://github.com/fchollet/keras/blob/53e541f7bf55de036f4f5641bd2947b96dd8c4c3/keras/metrics.py

Maybe someone would put together a keras-contrib package.

apacha commented 7 years ago

I agree with @brannondorsey. According to @fchollet, he explained in #5794 that it was intentionally removed in version 2.0 because it performs only approximation by batchwise evaluation. Unfortunately, there seems to be no evidence (#6002 #5705), that someone is working on a global measurement.

Probably the best thing to do currently is to store the predictions and then use Scikit for calculating global measurements. For me the following worked out quite well on a classification task:

Predict classes


test_generator = ImageDataGenerator()
test_data_generator = test_generator.flow_from_directory(
"test_directory",
batch_size=32,
shuffle=False)
test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size)

predictions = model.predict_generator(test_data_generator, steps=test_steps_per_epoch)

Get most likely class

predicted_classes = numpy.argmax(predictions, axis=1)

2. Get ground-truth classes and class-labels

true_classes = test_data_generator.classes class_labels = list(test_data_generator.class_indices.keys())


3. Use scikit-learn to get statistics

report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels) print(report)

sxs4337 commented 7 years ago

@apacha Thanks for the detailed explanation. This is very helpful. I have a follow up question.

While using "predict_generator", How to ensure that the prediction is done on all test samples once.

For example- predictions = model.predict_generator( test_generator, steps=int(test_generator.samples/float(batch_size)), # all samples once verbose = 1, workers = 2, max_q_size=10, pickle_safe=True ) predicted_classes = np.argmax(predictions, axis=1) true_classes = test_generator.classes

So the dimensions of predicted_classes and true_classes is different since total samples is not divisible by batch size.

The size of my test_set is not consistent, so the no. of steps in predict_generator would change each time depending upon the batch size. I am using flow_from_directory and cannot use predict_on_batch since my data is organized in a directory structure.

One solution is running with batch size of 1, but makes it very slow.

I hope my question is clear. Thanks in advance.

apacha commented 7 years ago

@sxs4337 I am happy to tell you, that you don't have to worry about that, when using the ImageDataGenerator, as it automatically takes care of the last batch, if your samples are not divisible by the batch size. For example, if you have 10 samples and a minibatch-size of 4, test_generator will create batches of the following size: 4, 4, 2. Consecutive next()-calls will repeat the sequence from the beginning.

By using test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size) you automatically will get 3 batches for the example from above, which will result in a total of 10 predictions.

sxs4337 commented 7 years ago

@apacha Thank you for the reply. I did try that and it seems to miss out on the last few test samples. I may be missing something very obvious here.

I have 505 test samples and tried running with a batch size of 4.

Below is my code snippet-

test_datagen = ImageDataGenerator(preprocessing_function=vgg_preprocess)
test_generator = test_datagen.flow_from_directory(
    'dataset_toy/test_toy',
    target_size=(img_rows, img_cols),
    batch_size = 4,
    shuffle=False,
    class_mode='categorical')
predictions = model.predict_generator(
    test_generator,
    steps = np.math.ceil(test_generator.samples / test_generator.batch_size),
    verbose = 1,
    workers = 2,
    max_q_size=10,
    pickle_safe=True
    )
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test_generator.classes
class_labels = list(test_generator.class_indices.keys())
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
accuracy = metrics.accuracy_score(true_classes, predicted_classes)

Here is the error-

Found 505 images belonging to 10 classes. 124/126 [============================>.] - ETA: 0sTraceback (most recent call last): File "keras_finetune_vgg16_landmarks10k.py", line 201, in (report, accuracy) = test_mode(model_path) File "keras_finetune_vgg16_landmarks10k.py", line 177, in test_mode report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels) File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 1384, in classification_report sample_weight=sample_weight) File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 956, in precision_recall_fscore_support y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "/usr/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 72, in _check_targets check_consistent_length(y_true, y_pred) File "/usr/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 176, in check_consistent_length "%s" % str(uniques)) ValueError: Found arrays with inconsistent numbers of samples: [504 505]

So the prediction has 504 values where as ground truth is 505 values.

Thanks again and I appreciate the help.

apacha commented 7 years ago

Maybe this is a bug, when having more than one worker? Try it with workers=1 to see if the problem still remains. You can also check the len(predicted_classes) or run test_generator.next() a couple of times, to see what is reports.

If all that fails, I'm afraid that I can't help you. If you think this is a Keras bug, create a issue with detailed steps to reproduce the issue.

sxs4337 commented 7 years ago

@apacha It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4.

Found 505 images belonging to 10 classes. 126/126 [==============================] - 34s

/home/shagan/maya/landmark/keras_finetune_vgg16_landmarks10k.py(170)test_mode() -> predicted_classes = np.argmax(predictions, axis=1) (Pdb) predictions.shape (504, 10) (Pdb) test_generator.classes.shape (505,) (Pdb)

BTW, my keras version is 2.0.5 Thanks.

apacha commented 7 years ago

Well. Looks obvious to me now. See the number of steps? 126! 126x4=504. For some reason the calculation of the number of steps seems to have an issue. It should be 127, not 126

On 21 Jun 2017 6:19 pm, "sxs4337" notifications@github.com wrote:

@apacha https://github.com/apacha It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4.

Found 505 images belonging to 10 classes. 126/126 [==============================] - 34s

/home/shagan/maya/landmark/keras_finetunevgg16 landmarks10k.py(170)test_mode() -> predicted_classes = np.argmax(predictions, axis=1) (Pdb) predictions.shape (504, 10) (Pdb) test_generator.classes.shape (505,) (Pdb)

BTW, my keras version is 2.0.5 Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2607#issuecomment-310130917, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkSQQ3LUJ00Q2hruyxiXOIc3LkR2gQPks5sGUKVgaJpZM4IXJpS .

sxs4337 commented 7 years ago

Yes. That was the issue. Thanks a lot!

NourozR commented 7 years ago

what are the available "metrics" if I'm doing time series prediction(regression) in keras?

brannondorsey commented 7 years ago

@NourozR am I correct in assuming that you are using a mean squared error loss function? If so popular metrics include mean absolute error (mae) and accuracy (acc). From the metrics documentation page:

model.compile(loss='mean_squared_error',
              optimizer='sgd',
              metrics=['mae', 'acc'])

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

damienrj commented 6 years ago

Why are mean absolute error (mae) and accuracy (acc) not listed in the available metrics section. Are there any other hidden metrics?

mimoralea commented 6 years ago

@damienrj, nothing is hidden is you look at the code: https://github.com/fchollet/keras/blob/master/keras/metrics.py

If you look deep enough, you'll see that many loss functions are added as metrics. Look then at the loss page: https://keras.io/losses/

mushahrukhkhan commented 6 years ago

Is there is anyway to calculate precission@k and recall@k using the above-mentioned code ??? @mimoralea

ZER-0-NE commented 6 years ago

@apacha How can I extend your code to work for multiclass classification? My result for the predictions i get are all 1 but I need a list like [0,0,0,1,0,0,0,0,0,0,0,0] since i have 12 classes. How can I get that?

apacha commented 6 years ago

As far as I know, scikit's classification_report does support multiclass cases, but I am not sure if we are talking about the same thing. What exactly do you mean by multiclass classification: One object potentially belonging to multiple classes? Or just having 12 different classes in total? Maybe you need some one-hot encoding for the ground truth before computing the metrics. Apart from that, I'm afraid I can't help you unless you give more details, but I don't think this is the right place to answer such questions. Preferably, you should ask such questions on Stackoverflow.

tobigithub commented 6 years ago

@NourozR Keras metrics for regression are: r_square (R^2), mean absolute error (MAE), mean_squared_error (MSE) and root mean squared error (RMSE). See here: https://github.com/keras-team/keras/issues/7947

dynamicwebpaige commented 4 years ago

Closing, as the metrics docs have been updated on both keras.io and tensorflow.org. 🙂

keras-team / keras

what metrics can be used in keras #2607

Get most likely class