tensorflow / models

Models and examples built with TensorFlow
Other
77.02k stars 45.78k forks source link

Object detection API - error when finetuning centernet and faster_rcnn models: RuntimeError: Groundtruth tensor weights has not been provided #10295

Closed fgraffitti-cyberhawk closed 1 year ago

fgraffitti-cyberhawk commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb

2. Describe the bug

When running the tutorial notebook with a different model (e.g. centernet or faster_rcnn) I get the error: RuntimeError: Groundtruth tensor weights has not been provided when I try to train (fine-tune) the model. I noticed that I don't get this error when using the efficientdet architecture, that however has an SSD backbone as the ssd_resnet50 in the original tutorial notebook.

When trying the faster_cnn, I can't even get to the training point, as I get the error: RuntimeError: Groundtruth tensor boxes has not been provided when I run the model on the dummy image with the following code: prediction_dict = detection_model.predict(image, shapes)

3. Steps to reproduce

In the linked notebook, replace the code in: Create model and restore weights for all but last layer with the following:

tf.keras.backend.clear_session()

print('Building model and restoring weights for fine-tuning...', flush=True)
num_classes = 1

pipeline_config = 'path\to\pipeline.config'
checkpoint_path = 'path\to\checkpoint\ckpt-0'
checkpoint_path = r'C:\Users\Francesco.Graffitti\OneDrive - Cyberhawk 

# Load pipeline config and build a detection model.
#
# Since we are working off of a COCO architecture which predicts 90
# class slots by default, we override the `num_classes` field here to be just
# one (for our new rubber ducky class).
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
# model_config.ssd.num_classes = num_classes
# model_config.ssd.freeze_batchnorm = True
model_config.center_net.num_classes = num_classes
detection_model = model_builder.build(
      model_config=model_config, is_training=True)

# Set up object-based checkpoint restore 
fake_box_predictor = tf.compat.v2.train.Checkpoint(
    _prediction_head_dict=detection_model._prediction_head_dict,
    )
fake_model = tf.compat.v2.train.Checkpoint(
          _feature_extractor=detection_model._feature_extractor,
          _box_predictor=fake_box_predictor)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()

# Run model through a dummy image so that variables are created
image, shapes = detection_model.preprocess(tf.zeros([1, 640, 640, 3]))
prediction_dict = detection_model.predict(image, shapes)
_ = detection_model.postprocess(prediction_dict, shapes)
print('Weights restored!')

and the following code:

Select variables in top layers to fine-tune.

trainable_variables = detection_model.trainable_variables
to_fine_tune = []
prefixes_to_train = [
  'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalBoxHead',
  'WeightSharedConvolutionalBoxPredictor/WeightSharedConvolutionalClassHead']
for var in trainable_variables:
  if any([var.name.startswith(prefix) for prefix in prefixes_to_train]):
    to_fine_tune.append(var)

with:

# Select variables in top layers to fine-tune.
trainable_variables = detection_model.trainable_variables
to_fine_tune = []
for v in detection_model._prediction_head_dict.trainable_variables:
    to_fine_tune.append(v)

4. Expected behavior

I would expect the model to train as in the tutorial case.

5. Additional context

However I get the following error:

RuntimeError: in user code:

    <ipython-input-26-5e32e30edb27>:36 train_step_fn  *
        losses_dict = model.loss(prediction_dict, shapes)
    C:\Users\Francesco.Graffitti\.venv\tf-env\lib\site-packages\object_detection\meta_architectures\center_net_meta_arch.py:3581 loss  *
        object_center_loss = self._compute_object_center_loss(
    C:\Users\Francesco.Graffitti\.venv\tf-env\lib\site-packages\object_detection\meta_architectures\center_net_meta_arch.py:2691 _compute_object_center_loss  *
        gt_weights_list = self.groundtruth_lists(fields.BoxListFields.weights)
    C:\Users\Francesco.Graffitti\.venv\tf-env\lib\site-packages\object_detection\core\model.py:117 groundtruth_lists  *
        raise RuntimeError('Groundtruth tensor {} has not been provided'.format(

    RuntimeError: Groundtruth tensor weights has not been provided
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-28-5e32e30edb27> in <module>
     61 
     62   # Training step (forward pass + backwards pass)
---> 63   total_loss = train_step_fn(image_tensors, gt_boxes_list, gt_classes_list)
     64 
     65   if idx % 10 == 0:

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    887 
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    890 
    891       new_tracing_count = self.experimental_get_tracing_count()

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    931       # This is the first call of __call__, so we have to initialize.
    932       initializers = []
--> 933       self._initialize(args, kwds, add_initializers_to=initializers)
    934     finally:
    935       # At this point we know that the initialization is complete (or less

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\def_function.py in _initialize(self, args, kwds, add_initializers_to)
    761     self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph)
    762     self._concrete_stateful_fn = (
--> 763         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
    764             *args, **kwds))
    765 

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   3048       args, kwargs = None, None
   3049     with self._lock:
-> 3050       graph_function, _ = self._maybe_define_function(args, kwargs)
   3051     return graph_function
   3052 

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs)
   3442 
   3443           self._function_cache.missed.add(call_context_key)
-> 3444           graph_function = self._create_graph_function(args, kwargs)
   3445           self._function_cache.primary[cache_key] = graph_function
   3446 

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3277     arg_names = base_arg_names + missing_arg_names
   3278     graph_function = ConcreteFunction(
-> 3279         func_graph_module.func_graph_from_py_func(
   3280             self._name,
   3281             self._python_function,

~\.venv\tf-env\lib\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    997         _, original_func = tf_decorator.unwrap(python_func)
    998 
--> 999       func_outputs = python_func(*func_args, **func_kwargs)
   1000 
   1001       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~\.venv\tf-env\lib\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds)
    670         # the function a weak reference to itself to avoid a reference cycle.
    671         with OptionalXlaContext(compile_with_xla):
--> 672           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    673         return out
    674 

~\.venv\tf-env\lib\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs)
    984           except Exception as e:  # pylint:disable=broad-except
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:
    988               raise

RuntimeError: in user code:

    <ipython-input-26-5e32e30edb27>:36 train_step_fn  *
        losses_dict = model.loss(prediction_dict, shapes)
    C:\Users\Francesco.Graffitti\.venv\tf-env\lib\site-packages\object_detection\meta_architectures\center_net_meta_arch.py:3581 loss  *
        object_center_loss = self._compute_object_center_loss(
    C:\Users\Francesco.Graffitti\.venv\tf-env\lib\site-packages\object_detection\meta_architectures\center_net_meta_arch.py:2691 _compute_object_center_loss  *
        gt_weights_list = self.groundtruth_lists(fields.BoxListFields.weights)
    C:\Users\Francesco.Graffitti\.venv\tf-env\lib\site-packages\object_detection\core\model.py:117 groundtruth_lists  *
        raise RuntimeError('Groundtruth tensor {} has not been provided'.format(

    RuntimeError: Groundtruth tensor weights has not been provided

6. System information

EdenBelouadah commented 2 years ago

Hello, I am facing the same problem. Did you solve yours, please?

Thank you

jjuanvision commented 1 year ago

Same here! Any news about it?

laxmareddyp commented 1 year ago

Hi @fgraffitti-cyberhawk ,

Could you please check the recently released documentation in official Model Garden for Object detection API Tutorial.Please let me know if it helps you to resolve your issue.

Thanks

google-ml-butler[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 1 year ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

jjuanvision commented 1 year ago

Just to clarify... This is not solved as the tutorial is just for training via scripts provided. It is not possible to train by building the graph from scratch and fine tune layers you would want to for fine tunnning except for SSD metaarch.

What's more, there are feature extractors in some metaarch that are not trackable objects so that It is not possible to restore them (for example, the fasterrcnn one). I'm currently training my models via scripting similar as what it shows at the tutorial indicated in the previous comment when I work with Tensorflow but I think it is not that flexible for a developer.

Thanks anyways for the collaboration!! Much appreciated.