Adding metrics seems to make traning fail to start

guysoft commented 3 years ago

Hey, I have the following code for training:

from yolov4.tf import YOLOv4

yolo = YOLOv4()
# yolo = YOLOv4(tiny=True)

yolo.classes = "coco.names"
yolo.input_size = (640, 480)

yolo.make_model()
yolo.load_weights("yolov4.weights", weights_type="yolo")

[...]

yolo.model.summary()
yolo.model.compile(
    optimizer=optimizer,
    loss=train.YOLOv4Loss(
        batch_size=yolo.batch_size,
        iou_type="ciou",
        verbose=0,
    ))

However if I add a matric, for example:

metric = train.YOLOv4Loss(
        batch_size=16,
        iou_type="ciou",
        verbose=0,
    )

or:

metric=tf.keras.metrics.Accuracy()

And then add it to the model compile:

yolo.model.summary()
yolo.model.compile(
    metrics=[metric]
    optimizer=optimizer,
    loss=train.YOLOv4Loss(
        batch_size=yolo.batch_size,
        iou_type="ciou",
        verbose=0,
    ))

I get the following error.

How come that setting the loss function as a metric makes this fail? Since AFAIK loss functions should function as metrics too.

train_yolo.py", line 169, in <module>
    yolo.fit(
  File "/usr/local/lib/python3.8/dist-packages/yolov4/tf/__init__.py", line 271, in fit
    self.model.fit(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 545, in call
    outputs = execute.execute(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Can not squeeze dim[4], expected a dimension of 1, got 7
     [[node remove_squeezable_dimensions/Squeeze (defined at usr/local/lib/python3.8/dist-packages/yolov4/tf/__init__.py:271) ]] [Op:__inference_train_function_19210]

Function call stack:
train_function

2021-01-19 17:48:55.712426: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
     [[{{node PyFunc}}]]

Using latest release.

If I make just a blank loss function, I can get it to print the shape of the tensor which is (None, None, None, None)

def cusotm_test_loss(y_true, y_pred):
    print(y_true.shape)
    reutrn 0

Thanks,

hhk7734 commented 3 years ago

I don't know anything about metrics.

hhk7734 commented 3 years ago

I want to help after checking how metric is moved and used inside tensorflow, but It is difficult because I have a lot of work now.

guysoft commented 3 years ago

Here is a tl;dr Metrics are like a loss function. They let you calculate about y_true y_pred and get a number. Telling you how well you are doing. Unlike a loss, a metric does not have to be differentiable. The derivative is needed to calculate gradient descent, but you only need one of those. So why add a metric? It lets you see how well your network is learning. You might find that your networks epoch 100 is great but 200 its not as good. It also lets you score how well one network is doing compared to another in a real scenario (how many successful detection, for example, not some strange loss definition of success).

You could set your loss as a metric too.

The loss in this repo is set up as a class, that generates function. Its defined here: https://github.com/hhk7734/tensorflow-yolov4/blob/master/py_src/yolov4/tf/train.py#L32

Keras has several popular metrics. Here are a list: https://keras.io/api/metrics/

The thing is. I think that because the tensor shape here is (None, None, None, None) its causing issues with metrics.

guysoft commented 3 years ago

Ok, not sure why, but it seems that adding this makes my metrics work:

tf.config.experimental_run_functions_eagerly(True)

Update:

This is the non-deprecated way:

tf.config.run_functions_eagerly(True)

hhk7734 commented 3 years ago

hi~

TF2 training default mode is Autograph, the others are Eager. On Autograph mode, it seems to be normal that print(shape) is (None, None, None, None).

When calling compile, TF2 make compiled_loss. and automatically registor loss to metrics.

https://github.com/tensorflow/tensorflow/blob/a0ef001fa88c27d32df4c3fcd03474552d0837c2/tensorflow/python/keras/engine/compile_utils.py#L108-L133

class LossesContainer(Container):
  """A container class for losses passed to `Model.compile`."""

  @property
  def metrics(self):
    """Per-output loss metrics."""
    if not self._built:
      return []
    per_output_metrics = [
        metric_obj for metric_obj in nest.flatten(self._per_output_metrics)
        if metric_obj is not None
    ]
    return [self._loss_metric] + per_output_metrics

and according to TF docs.

  def compile(self,
              optimizer='rmsprop',
              loss=None,
              metrics=None,
              loss_weights=None,
              weighted_metrics=None,
              run_eagerly=None,
              steps_per_execution=None,
              **kwargs):
    """Configures the model for training.

    Arguments:
...
        loss: String (name of objective function), objective function or
          `tf.keras.losses.Loss` instance. 
...
        metrics: List of metrics to be evaluated by the model during training
          and testing. Each of this can be a string (name of a built-in
          function), function or a `tf.keras.metrics.Metric` instance. See
          `tf.keras.metrics`. 
    """

The loss type and the metric type seem to be different.

I made a callback and refactored the loss function to evaluate the training.

https://github.com/hhk7734/tensorflow-yolov4/blob/master/py_src/yolov4/tf/training/callbacks/yolo_each_step.py https://github.com/hhk7734/tensorflow-yolov4/blob/master/py_src/yolov4/tf/training/yolo_loss.py

So, training can be evaluated in real time. Ex) https://wiki.loliot.net/docs/lang/python/libraries/yolov4/python-yolov4-training#tensorboard

hhk7734 / tensorflow-yolov4

Adding metrics seems to make traning fail to start #55