Metrics does not work with tf.keras.estimator.model_to_estimator

NKUCodingCat commented 5 years ago

I am trying to use tf.keras.estimator.model_to_estimator to convert tf.keras model to be distributed, however, I found that keras-metrics does not work as desired, Is there any idea or work around for me ? thanks

Traceback:

Traceback (most recent call last):
  File "1.py", line 204, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "1.py", line 190, in main
    config=Rcfg
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/estimator/__init__.py", line 73, in model_to_estimator
    config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 486, in model_to_estimator
    config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 354, in _save_first_checkpoint
    custom_objects)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py", line 201, in _clone_and_build_model
    optimizer_iterations=global_step)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/models.py", line 511, in clone_and_build_model
    target_tensors=target_tensors)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpointable/base.py", line 442, in _method_wrapper
    method(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 499, in compile
    sample_weights=self.sample_weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 1844, in _handle_metrics
    return_stateful_result=return_stateful_result))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 1800, in _handle_per_output_metrics
    stateful_metric_result = _call_stateful_fn(stateful_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 1773, in _call_stateful_fn
    fn, y_true, y_pred, weights=weights, mask=mask)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training_utils.py", line 852, in call_metric_function
    return metric_fn(y_true, y_pred, sample_weight=weights)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/metrics.py", line 438, in __call__
    update_op = self.update_state(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/metrics.py", line 160, in inner
    return func.__get__(instance_ref(), cls)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/metrics.py", line 98, in decorated
    update_op = update_state_fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/metrics.py", line 649, in update_state
    matches = self._fn(y_true, y_pred, **self._fn_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras_metrics/metrics.py", line 192, in __call__
    tp = self.tp(y_true, y_pred)
  File "/usr/local/lib/python2.7/dist-packages/keras_metrics/metrics.py", line 50, in __call__
    tp_update = K.update_add(self.tp, tp)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 986, in update_add
    return tf.assign_add(x, increment)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 190, in assign_add
    ref, value, use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 107, in assign_add
    "AssignAdd", ref=ref, value=value, use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 350, in _apply_op_helper
    g = ops._get_graph_from_inputs(_Flatten(keywords.values()))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 5713, in _get_graph_from_inputs
    _assert_same_graph(original_graph_element, graph_element)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 5649, in _assert_same_graph
    original_item))
ValueError: Tensor("metrics/precision/Sum:0", shape=(), dtype=int32) must be from the same graph as Tensor("Variable:0", shape=(), dtype=int32_ref).

Buggy code (a little bit messy...)

model.py.zip

If I dont add keras_metrics.sparse_categorical_precision() into Accuracy part, it DOES work but fail when I add sparse_categorical_precision...

Tested in Py2.7/3.7 TF 1.13.1

NKUCodingCat commented 5 years ago

I think I found the reason but I have no idea about how to fix it

estimator.model_to_estimator will replace the graph built by keras with the graph they make, and the class is stateful, so an object, for example, built by true_positive, keeps an variant as self.tp, which is built by keras when I call it, but the graph of variant y_true/y_pred is replaced

I am not sure why there is a stateful layer because I check all loss/metrics of keras, I haven't found any loss is relies on the state of object(i.e. they are stateless), but I am not sure how keras_metrics work... Could you please provide some advices @ybubnov ?

NKUCodingCat commented 5 years ago

Seems it is caused by the behaviour of when model_to_estimator process layer and metrics, it might replace the graph of any layer(I guess) but it will not replace the graph in metrics since they assumed that the metrics is STATLESS.

As far as I concerned, true_positive is implemented as a layer will reduce the cost of calculation when user require multiple metrics provide by keras_metrics(? I am not sure, I had been confused by the calling chain). IF what I think is right, I think it is a meaningless optimization because it makes potential incompatibles.

Sorry for my broken english and hope it helps

sangyongjia commented 4 years ago

I also faced a samliar issue; the Metrics does not work with tf.keras.estimator.model_to_estimator. if I remove the tf.keras.mertics.AUC the code can work. or else it has an error like this:

return fn(*args, **kwargs)

File "/Users/sangyongjia/anaconda3/envs/tf1.x/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_utils.py", line 873, in call_metric_function return metric_fn(y_true, y_pred, sample_weight=weights) File "/Users/sangyongjia/anaconda3/envs/tf1.x/lib/python3.7/site-packages/tensorflow/python/keras/metrics.py", line 170, in call update_op = self.update_state(*args, *kwargs) # pylint: disable=not-callable File "/Users/sangyongjia/anaconda3/envs/tf1.x/lib/python3.7/site-packages/tensorflow/python/keras/utils/metrics_utils.py", line 73, in decorated update_op = update_state_fn(args, **kwargs) File "/Users/sangyongjia/anaconda3/envs/tf1.x/lib/python3.7/site-packages/tensorflow/python/keras/metrics.py", line 1715, in update_state }, y_true, y_pred, self.thresholds, sample_weight=sample_weight) File "/Users/sangyongjia/anaconda3/envs/tf1.x/lib/python3.7/site-packages/tensorflow/python/keras/utils/metrics_utils.py", line 268, in update_confusion_matrix_variables y_pred.shape.assert_is_compatible_with(y_true.shape) File "/Users/sangyongjia/anaconda3/envs/tf1.x/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 1103, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (?, 1) and (?,) are incompatible

the correspond code is : model.compile("adam", "binary_crossentropy", metrics=[tf.keras.metrics.BinaryCrossentropy(),tf.keras.metrics.AUC()])

if I remove tf.keras.metrics.AUC() model.compile("adam", "binary_crossentropy", metrics=[tf.keras.metrics.BinaryCrossentropy()]) it can work

anyone has some great idea? thx in advance.

ybubnov commented 4 years ago

@sangyongjia, it looks that your problem relates to TensorFlow project, not to the keras-metrics

sangyongjia commented 4 years ago

@sangyongjia, it looks that your problem relates to TensorFlow project, not to the keras-metrics

you mean I should remove the Tensorflow and install it again or update to higher version of tf ？ Now the version of tf I am using is 1.14。

sangyongjia commented 4 years ago

@ybubnov

ybubnov commented 4 years ago

@sangyongjia, it looks that the problem is in the TensorFlow itself, particularly in to_model_estimator call, JosPolfliet created an issue in that project a long time ago: https://github.com/tensorflow/tensorflow/issues/34040 and it is still unresolved.

I think one of possible ways to deal with the problem is to enforce issue resolution I've posted above. Alternatively, you could not use model_to_estimator if that is possible (simply use a model).

From my side, I'll try to relax the TensorFlow restriction in order to allow newer version of TensorFlow be used with keras-metrics library.

sangyongjia commented 4 years ago

thx so much @ybubnov

netrack / keras-metrics

Metrics does not work with tf.keras.estimator.model_to_estimator #39