Closed lqj1990 closed 4 years ago
One of the doc pages says the accuracy is the only thing implemented right now. There really should be a tab for metrics that says that and can be expanded later.
I found funcitons name which like 'mae' or 'mean_absolute_error' in keras.metrics can be used in metrics, just like the parameter loss. It seems like the metrics is just used for logging, not joined in the training work. By the way, the document really need to point that what the metrics support.
Precision, Recall and F1-score were added by someone:
https://github.com/fchollet/keras/blob/master/keras/metrics.py
Example usage:
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['binary_accuracy', 'fmeasure', 'precision', 'recall'])
After updating I still get this error: Exception: Invalid metric: precision
Hey Greg,
As of now, the latest Keras package doesn't contain this yet.
You can download the metrics code from GitHub, then copy it over your current one:
wget https://raw.githubusercontent.com/fchollet/keras/master/keras/metrics.py
sudo cp metrics.py /usr/local/lib/python2.7/dist-packages/keras/
Thanks! That worked great
I think the document is already updated? https://keras.io/metrics/
What is the difference between loss (objectives) and metrics?
@wqp89324 A metric is a function that is used to judge the performance of your model. A metric function is similar to an loss function, except that the results from evaluating a metric are not used when training the model. You can find from this url: https://keras.io/metrics/
@wqp89324 Another way to put it, expanding on @jhli973's answer, is that the evaluation metric is what you as the researcher will use to judge the model's performance (on training, test, and/or evaluation data); it's the bottom line number that you would publish. The loss function is what the network will use to try to improve itself, hopefully in a way that leads to improved evaluation for the researcher's sake. For example, in a binary classification problem, the network might train using a binary crossentropy loss function with gradient descent, whereas the modeler's goal is to design a network to improve binary category accuracy on hold-out data.
It looks like many of the helpful metrics that used to be supported have been removed with Keras 2.0. I'm working on a classification problem where f-score would be much more valuable to me than accuracy. Is there a way that I can use that as a metric, or am I encouraged to use metrics.categorical_accuracy
instead? If so, why? And how does that differ from metrics.sparse_categorical_accuracy
. Cheers!
I resolved my problem by getting the old code from https://github.com/fchollet/keras/blob/53e541f7bf55de036f4f5641bd2947b96dd8c4c3/keras/metrics.py
Maybe someone would put together a keras-contrib package.
I agree with @brannondorsey. According to @fchollet, he explained in #5794 that it was intentionally removed in version 2.0 because it performs only approximation by batchwise evaluation. Unfortunately, there seems to be no evidence (#6002 #5705), that someone is working on a global measurement.
Probably the best thing to do currently is to store the predictions and then use Scikit for calculating global measurements. For me the following worked out quite well on a classification task:
test_generator = ImageDataGenerator()
test_data_generator = test_generator.flow_from_directory(
"test_directory",
batch_size=32,
shuffle=False)
test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size)
predictions = model.predict_generator(test_data_generator, steps=test_steps_per_epoch)
predicted_classes = numpy.argmax(predictions, axis=1)
2. Get ground-truth classes and class-labels
true_classes = test_data_generator.classes class_labels = list(test_data_generator.class_indices.keys())
3. Use scikit-learn to get statistics
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels) print(report)
@apacha Thanks for the detailed explanation. This is very helpful. I have a follow up question.
While using "predict_generator", How to ensure that the prediction is done on all test samples once.
For example- predictions = model.predict_generator( test_generator, steps=int(test_generator.samples/float(batch_size)), # all samples once verbose = 1, workers = 2, max_q_size=10, pickle_safe=True ) predicted_classes = np.argmax(predictions, axis=1) true_classes = test_generator.classes
So the dimensions of predicted_classes and true_classes is different since total samples is not divisible by batch size.
The size of my test_set is not consistent, so the no. of steps in predict_generator would change each time depending upon the batch size. I am using flow_from_directory and cannot use predict_on_batch since my data is organized in a directory structure.
One solution is running with batch size of 1, but makes it very slow.
I hope my question is clear. Thanks in advance.
@sxs4337 I am happy to tell you, that you don't have to worry about that, when using the ImageDataGenerator, as it automatically takes care of the last batch, if your samples are not divisible by the batch size. For example, if you have 10 samples and a minibatch-size of 4, test_generator
will create batches of the following size: 4, 4, 2. Consecutive next()-calls will repeat the sequence from the beginning.
By using test_steps_per_epoch = numpy.math.ceil(test_data_generator.samples / test_data_generator.batch_size)
you automatically will get 3 batches for the example from above, which will result in a total of 10 predictions.
@apacha Thank you for the reply. I did try that and it seems to miss out on the last few test samples. I may be missing something very obvious here.
I have 505 test samples and tried running with a batch size of 4.
Below is my code snippet-
test_datagen = ImageDataGenerator(preprocessing_function=vgg_preprocess)
test_generator = test_datagen.flow_from_directory(
'dataset_toy/test_toy',
target_size=(img_rows, img_cols),
batch_size = 4,
shuffle=False,
class_mode='categorical')
predictions = model.predict_generator(
test_generator,
steps = np.math.ceil(test_generator.samples / test_generator.batch_size),
verbose = 1,
workers = 2,
max_q_size=10,
pickle_safe=True
)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = test_generator.classes
class_labels = list(test_generator.class_indices.keys())
report = metrics.classification_report(true_classes, predicted_classes, target_names=class_labels)
accuracy = metrics.accuracy_score(true_classes, predicted_classes)
Here is the error-
Found 505 images belonging to 10 classes.
124/126 [============================>.] - ETA: 0sTraceback (most recent call last):
File "keras_finetune_vgg16_landmarks10k.py", line 201, in
So the prediction has 504 values where as ground truth is 505 values.
Thanks again and I appreciate the help.
Maybe this is a bug, when having more than one worker? Try it with workers=1
to see if the problem still remains. You can also check the len(predicted_classes)
or run test_generator.next()
a couple of times, to see what is reports.
If all that fails, I'm afraid that I can't help you. If you think this is a Keras bug, create a issue with detailed steps to reproduce the issue.
@apacha It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4.
Found 505 images belonging to 10 classes. 126/126 [==============================] - 34s
/home/shagan/maya/landmark/keras_finetune_vgg16_landmarks10k.py(170)test_mode() -> predicted_classes = np.argmax(predictions, axis=1) (Pdb) predictions.shape (504, 10) (Pdb) test_generator.classes.shape (505,) (Pdb)
BTW, my keras version is 2.0.5 Thanks.
Well. Looks obvious to me now. See the number of steps? 126! 126x4=504. For some reason the calculation of the number of steps seems to have an issue. It should be 127, not 126
On 21 Jun 2017 6:19 pm, "sxs4337" notifications@github.com wrote:
@apacha https://github.com/apacha It has the same issue with workers=1. I put a debugger after model.predict_generator to check the shapes. prediction is getting just 504 samples out of 505 with batch size of 4.
Found 505 images belonging to 10 classes. 126/126 [==============================] - 34s
/home/shagan/maya/landmark/keras_finetunevgg16 landmarks10k.py(170)test_mode() -> predicted_classes = np.argmax(predictions, axis=1) (Pdb) predictions.shape (504, 10) (Pdb) test_generator.classes.shape (505,) (Pdb)
BTW, my keras version is 2.0.5 Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2607#issuecomment-310130917, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkSQQ3LUJ00Q2hruyxiXOIc3LkR2gQPks5sGUKVgaJpZM4IXJpS .
Yes. That was the issue. Thanks a lot!
what are the available "metrics" if I'm doing time series prediction(regression) in keras?
@NourozR am I correct in assuming that you are using a mean squared error loss function? If so popular metrics include mean absolute error (mae
) and accuracy (acc
). From the metrics documentation page:
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['mae', 'acc'])
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Why are mean absolute error (mae
) and accuracy (acc
) not listed in the available metrics section. Are there any other hidden metrics?
@damienrj, nothing is hidden is you look at the code: https://github.com/fchollet/keras/blob/master/keras/metrics.py
If you look deep enough, you'll see that many loss functions are added as metrics. Look then at the loss page: https://keras.io/losses/
Is there is anyway to calculate precission@k and recall@k using the above-mentioned code ??? @mimoralea
@apacha How can I extend your code to work for multiclass classification? My result for the predictions i get are all 1 but I need a list like [0,0,0,1,0,0,0,0,0,0,0,0] since i have 12 classes. How can I get that?
As far as I know, scikit's classification_report does support multiclass cases, but I am not sure if we are talking about the same thing. What exactly do you mean by multiclass classification: One object potentially belonging to multiple classes? Or just having 12 different classes in total? Maybe you need some one-hot encoding for the ground truth before computing the metrics. Apart from that, I'm afraid I can't help you unless you give more details, but I don't think this is the right place to answer such questions. Preferably, you should ask such questions on Stackoverflow.
@NourozR Keras metrics for regression are: r_square (R^2), mean absolute error (MAE), mean_squared_error (MSE) and root mean squared error (RMSE). See here: https://github.com/keras-team/keras/issues/7947
Closing, as the metrics docs have been updated on both keras.io and tensorflow.org. 🙂
Most examples are use metrics=['accuracy'], but accuracy is not always suitable for every task.