qubvel / segmentation_models

Segmentation models with pretrained backbones. Keras and TensorFlow Keras.
MIT License
4.74k stars 1.03k forks source link

ValueError while training binary segmentation example #427

Open AnamKhurshid17 opened 3 years ago

AnamKhurshid17 commented 3 years ago

While running binary segmentation (camvid) notebook, error come at the following step

train model

history = model.fit_generator( train_dataloader, steps_per_epoch=len(train_dataloader), epochs=EPOCHS, callbacks=callbacks, validation_data=valid_dataloader, validation_steps=len(valid_dataloader), )

Epoch 1/40

ValueError Traceback (most recent call last)

in 1 # train model ----> 2 history = model.fit_generator( 3 train_dataloader, 4 steps_per_epoch=len(train_dataloader), 5 epochs=EPOCHS, ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\util\deprecation.py in new_func(*args, **kwargs) 322 'in a future version' if date is None else ('after %s' % date), 323 instructions) --> 324 return func(*args, **kwargs) 325 return tf_decorator.make_decorator( 326 func, new_func, 'deprecated', ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1813 """ 1814 _keras_api_gauge.get_cell('fit_generator').set(True) -> 1815 return self.fit( 1816 generator, 1817 steps_per_epoch=steps_per_epoch, ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs) 106 def _method_wrapper(self, *args, **kwargs): 107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access --> 108 return method(self, *args, **kwargs) 109 110 # Running inside `run_distribute_coordinator` already. ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1096 batch_size=batch_size): 1097 callbacks.on_train_batch_begin(step) -> 1098 tmp_logs = train_function(iterator) 1099 if data_handler.should_sync: 1100 context.async_wait() ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds) 778 else: 779 compiler = "nonXla" --> 780 result = self._call(*args, **kwds) 781 782 new_tracing_count = self._get_tracing_count() ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds) 821 # This is the first call of __call__, so we have to initialize. 822 initializers = [] --> 823 self._initialize(args, kwds, add_initializers_to=initializers) 824 finally: 825 # At this point we know that the initialization is complete (or less ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\def_function.py in _initialize(self, args, kwds, add_initializers_to) 694 self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph) 695 self._concrete_stateful_fn = ( --> 696 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access 697 *args, **kwds)) 698 ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs) 2853 args, kwargs = None, None 2854 with self._lock: -> 2855 graph_function, _, _ = self._maybe_define_function(args, kwargs) 2856 return graph_function 2857 ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs) 3211 3212 self._function_cache.missed.add(call_context_key) -> 3213 graph_function = self._create_graph_function(args, kwargs) 3214 self._function_cache.primary[cache_key] = graph_function 3215 return graph_function, args, kwargs ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes) 3063 arg_names = base_arg_names + missing_arg_names 3064 graph_function = ConcreteFunction( -> 3065 func_graph_module.func_graph_from_py_func( 3066 self._name, 3067 self._python_function, ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes) 984 _, original_func = tf_decorator.unwrap(python_func) 985 --> 986 func_outputs = python_func(*func_args, **func_kwargs) 987 988 # invariant: `func_outputs` contains only Tensors, CompositeTensors, ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds) 598 # __wrapped__ allows AutoGraph to swap in a converted function. We give 599 # the function a weak reference to itself to avoid a reference cycle. --> 600 return weak_wrapped_fn().__wrapped__(*args, **kwds) 601 weak_wrapped_fn = weakref.ref(wrapped_fn) 602 ~\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs) 971 except Exception as e: # pylint:disable=broad-except 972 if hasattr(e, "ag_error_metadata"): --> 973 raise e.ag_error_metadata.to_exception(e) 974 else: 975 raise ValueError: in user code: C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py:806 train_function * return step_function(self, iterator) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py:796 step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1211 run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2585 call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2945 _call_for_each_replica return fn(*args, **kwargs) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py:789 run_step ** outputs = model.train_step(data) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py:756 train_step _minimize(self.distribute_strategy, tape, self.optimizer, loss, C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\engine\training.py:2736 _minimize gradients = optimizer._aggregate_gradients(zip(gradients, # pylint: disable=protected-access C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:562 _aggregate_gradients filtered_grads_and_vars = _filter_grads(grads_and_vars) C:\Users\Hp\anaconda3\envs\AntEnv\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:1270 _filter_grads raise ValueError("No gradients provided for any variable: %s." % ValueError: No gradients provided for any variable: ['stem_conv/kernel:0', 'stem_bn/gamma:0', 'stem_bn/beta:0', 'block1a_dwconv/depthwise_kernel:0', 'block1a_bn/gamma:0', 'block1a_bn/beta:0', 'block1a_se_reduce/kernel:0', 'block1a_se_reduce/bias:0', 'block1a_se_expand/kernel:0', 'block1a_se_expand/bias:0', 'block1a_project_conv/kernel:0', 'block1a_project_bn/gamma:0', 'block1a_project_bn/beta:0', 'block1b_dwconv/depthwise_kernel:0', 'block1b_bn/gamma:0', 'block1b_bn/beta:0', 'block1b_se_reduce/kernel:0', 'block1b_se_reduce/bias:0', 'block1b_se_expand/kernel:0', 'block1b_se_expand/bias:0', 'block1b_project_conv/kernel:0', 'block1b_project_bn/gamma:0', 'block1b_project_bn/beta:0', 'block2a_expand_conv/kernel:0', 'block2a_expand_bn/gamma:0', 'block2a_expand_bn/beta:0', 'block2a_dwconv/depthwise_kernel:0', 'block2a_bn/gamma:0', 'block2a_bn/beta:0', 'block2a_se_reduce/kernel:0', 'block2a_se_reduce/bias:0', 'block2a_se_expand/kernel:0', 'block2a_se_expand/bias:0', 'block2a_project_conv/kernel:0', 'block2a_project_bn/gamma:0', 'block2a_project_bn/beta:0', 'block2b_expand_conv/kernel:0', 'block2b_expand_bn/gamma:0', 'block2b_expand_bn/beta:0', 'block2b_dwconv/depthwise_kernel:0', 'block2b_bn/gamma:0', 'block2b_bn/beta:0', 'block2b_se_reduce/kernel:0', 'block2b_se_reduce/bias:0', 'block2b_se_expand/kernel:0', 'block2b_se_expand/bias:0', 'block2b_project_conv/kernel:0', 'block2b_project_bn/gamma:0', 'block2b_project_bn/beta:0', 'block2c_expand_conv/kernel:0', 'block2c_expand_bn/gamma:0', 'block2c_expand_bn/beta:0', 'block2c_dwconv/depthwise_kernel:0', 'block2c_bn/gamma:0', 'block2c_bn/beta:0', 'block2c_se_reduce/kernel:0', 'block2c_se_reduce/bias:0', 'block2c_se_expand/kernel:0', 'block2c_se_expand/bias:0', 'block2c_project_conv/kernel:0', 'block2c_project_bn/gamma:0', 'block2c_project_bn/beta:0', 'block3a_expand_conv/kernel:0', 'block3a_expand_bn/gamma:0', 'block3a_expand_bn/beta:0', 'block3a_dwconv/depthwise_kernel:0', 'block3a_bn/gamma:0', 'block3a_bn/beta:0', 'block3a_se_reduce/kernel:0', 'block3a_se_reduce/bias:0', 'block3a_se_expand/kernel:0', 'block3a_se_expand/bias:0', 'block3a_project_conv/kernel:0', 'block3a_project_bn/gamma:0', 'block3a_project_bn/beta:0', 'block3b_expand_conv/kernel:0', 'block3b_expand_bn/gamma:0', 'block3b_expand_bn/beta:0', 'block3b_dwconv/depthwise_kernel:0', 'block3b_bn/gamma:0', 'block3b_bn/beta:0', 'block3b_se_reduce/kernel:0', 'block3b_se_reduce/bias:0', 'block3b_se_expand/kernel:0', 'block3b_se_expand/bias:0', 'block3b_project_conv/kernel:0', 'block3b_project_bn/gamma:0', 'block3b_project_bn/beta:0', 'block3c_expand_conv/kernel:0', 'block3c_expand_bn/gamma:0', 'block3c_expand_bn/beta:0', 'block3c_dwconv/depthwise_kernel:0', 'block3c_bn/gamma:0', 'block3c_bn/beta:0', 'block3c_se_reduce/kernel:0', 'block3c_se_reduce/bias:0', 'block3c_se_expand/kernel:0', 'block3c_se_expand/bias:0', 'block3c_project_conv/kernel:0', 'block3c_project_bn/gamma:0', 'block3c_project_bn/beta:0', 'block4a_expand_conv/kernel:0', 'block4a_expand_bn/gamma:0', 'block4a_expand_bn/beta:0', 'block4a_dwconv/depthwise_kernel:0', 'block4a_bn/gamma:0', 'block4a_bn/beta:0', 'block4a_se_reduce/kernel:0', 'block4a_se_reduce/bias:0', 'block4a_se_expand/kernel:0', 'block4a_se_expand/bias:0', 'block4a_project_conv/kernel:0', 'block4a_project_bn/gamma:0', 'block4a_project_bn/beta:0', 'block4b_expand_conv/kernel:0', 'block4b_expand_bn/gamma:0', 'block4b_expand_bn/beta:0', 'block4b_dwconv/depthwise_kernel:0', 'block4b_bn/gamma:0', 'block4b_bn/beta:0', 'block4b_se_reduce/kernel:0', 'block4b_se_reduce/bias:0', 'block4b_se_expand/kernel:0', 'block4b_se_expand/bias:0', 'block4b_project_conv/kernel:0', 'block4b_project_bn/gamma:0', 'block4b_project_bn/beta:0', 'block4c_expand_conv/kernel:0', 'block4c_expand_bn/gamma:0', 'block4c_expand_bn/beta:0', 'block4c_dwconv/depthwise_kernel:0', 'block4c_bn/gamma:0', 'block4c_bn/beta:0', 'block4c_se_reduce/kernel:0', 'block4c_se_reduce/bias:0', 'block4c_se_expand/kernel:0', 'block4c_se_expand/bias:0', 'block4c_project_conv/kernel:0', 'block4c_project_bn/gamma:0', 'block4c_project_bn/beta:0', 'block4d_expand_conv/kernel:0', 'block4d_expand_bn/gamma:0', 'block4d_expand_bn/beta:0', 'block4d_dwconv/depthwise_kernel:0', 'block4d_bn/gamma:0', 'block4d_bn/beta:0', 'block4d_se_reduce/kernel:0', 'block4d_se_reduce/bias:0', 'block4d_se_expand/kernel:0', 'block4d_se_expand/bias:0', 'block4d_project_conv/kernel:0', 'block4d_project_bn/gamma:0', 'block4d_project_bn/beta:0', 'block4e_expand_conv/kernel:0', 'block4e_expand_bn/gamma:0', 'block4e_expand_bn/beta:0', 'block4e_dwconv/depthwise_kernel:0', 'block4e_bn/gamma:0', 'block4e_bn/beta:0', 'block4e_se_reduce/kernel:0', 'block4e_se_reduce/bias:0', 'block4e_se_expand/kernel:0', 'block4e_se_expand/bias:0', 'block4e_project_conv/kernel:0', 'block4e_project_bn/gamma:0', 'block4e_project_bn/beta:0', 'block5a_expand_conv/kernel:0', 'block5a_expand_bn/gamma:0', 'block5a_expand_bn/beta:0', 'block5a_dwconv/depthwise_kernel:0', 'block5a_bn/gamma:0', 'block5a_bn/beta:0', 'block5a_se_reduce/kernel:0', 'block5a_se_reduce/bias:0', 'block5a_se_expand/kernel:0', 'block5a_se_expand/bias:0', 'block5a_project_conv/kernel:0', 'block5a_project_bn/gamma:0', 'block5a_project_bn/beta:0', 'block5b_expand_conv/kernel:0', 'block5b_expand_bn/gamma:0', 'block5b_expand_bn/beta:0', 'block5b_dwconv/depthwise_kernel:0', 'block5b_bn/gamma:0', 'block5b_bn/beta:0', 'block5b_se_reduce/kernel:0', 'block5b_se_reduce/bias:0', 'block5b_se_expand/kernel:0', 'block5b_se_expand/bias:0', 'block5b_project_conv/kernel:0', 'block5b_project_bn/gamma:0', 'block5b_project_bn/beta:0', 'block5c_expand_conv/kernel:0', 'block5c_expand_bn/gamma:0', 'block5c_expand_bn/beta:0', 'block5c_dwconv/depthwise_kernel:0', 'block5c_bn/gamma:0', 'block5c_bn/beta:0', 'block5c_se_reduce/kernel:0', 'block5c_se_reduce/bias:0', 'block5c_se_expand/kernel:0', 'block5c_se_expand/bias:0', 'block5c_project_conv/kernel:0', 'block5c_project_bn/gamma:0', 'block5c_project_bn/beta:0', 'block5d_expand_conv/kernel:0', 'block5d_expand_bn/gamma:0', 'block5d_expand_bn/beta:0', 'block5d_dwconv/depthwise_kernel:0', 'block5d_bn/gamma:0', 'block5d_bn/beta:0', 'block5d_se_reduce/kernel:0', 'block5d_se_reduce/bias:0', 'block5d_se_expand/kernel:0', 'block5d_se_expand/bias:0', 'block5d_project_conv/kernel:0', 'block5d_project_bn/gamma:0', 'block5d_project_bn/beta:0', 'block5e_expand_conv/kernel:0', 'block5e_expand_bn/gamma:0', 'block5e_expand_bn/beta:0', 'block5e_dwconv/depthwise_kernel:0', 'block5e_bn/gamma:0', 'block5e_bn/beta:0', 'block5e_se_reduce/kernel:0', 'block5e_se_reduce/bias:0', 'block5e_se_expand/kernel:0', 'block5e_se_expand/bias:0', 'block5e_project_conv/kernel:0', 'block5e_project_bn/gamma:0', 'block5e_project_bn/beta:0', 'block6a_expand_conv/kernel:0', 'block6a_expand_bn/gamma:0', 'block6a_expand_bn/beta:0', 'block6a_dwconv/depthwise_kernel:0', 'block6a_bn/gamma:0', 'block6a_bn/beta:0', 'block6a_se_reduce/kernel:0', 'block6a_se_reduce/bias:0', 'block6a_se_expand/kernel:0', 'block6a_se_expand/bias:0', 'block6a_project_conv/kernel:0', 'block6a_project_bn/gamma:0', 'block6a_project_bn/beta:0', 'block6b_expand_conv/kernel:0', 'block6b_expand_bn/gamma:0', 'block6b_expand_bn/beta:0', 'block6b_dwconv/depthwise_kernel:0', 'block6b_bn/gamma:0', 'block6b_bn/beta:0', 'block6b_se_reduce/kernel:0', 'block6b_se_reduce/bias:0', 'block6b_se_expand/kernel:0', 'block6b_se_expand/bias:0', 'block6b_project_conv/kernel:0', 'block6b_project_bn/gamma:0', 'block6b_project_bn/beta:0', 'block6c_expand_conv/kernel:0', 'block6c_expand_bn/gamma:0', 'block6c_expand_bn/beta:0', 'block6c_dwconv/depthwise_kernel:0', 'block6c_bn/gamma:0', 'block6c_bn/beta:0', 'block6c_se_reduce/kernel:0', 'block6c_se_reduce/bias:0', 'block6c_se_expand/kernel:0', 'block6c_se_expand/bias:0', 'block6c_project_conv/kernel:0', 'block6c_project_bn/gamma:0', 'block6c_project_bn/beta:0', 'block6d_expand_conv/kernel:0', 'block6d_expand_bn/gamma:0', 'block6d_expand_bn/beta:0', 'block6d_dwconv/depthwise_kernel:0', 'block6d_bn/gamma:0', 'block6d_bn/beta:0', 'block6d_se_reduce/kernel:0', 'block6d_se_reduce/bias:0', 'block6d_se_expand/kernel:0', 'block6d_se_expand/bias:0', 'block6d_project_conv/kernel:0', 'block6d_project_bn/gamma:0', 'block6d_project_bn/beta:0', 'block6e_expand_conv/kernel:0', 'block6e_expand_bn/gamma:0', 'block6e_expand_bn/beta:0', 'block6e_dwconv/depthwise_kernel:0', 'block6e_bn/gamma:0', 'block6e_bn/beta:0', 'block6e_se_reduce/kernel:0', 'block6e_se_reduce/bias:0', 'block6e_se_expand/kernel:0', 'block6e_se_expand/bias:0', 'block6e_project_conv/kernel:0', 'block6e_project_bn/gamma:0', 'block6e_project_bn/beta:0', 'block6f_expand_conv/kernel:0', 'block6f_expand_bn/gamma:0', 'block6f_expand_bn/beta:0', 'block6f_dwconv/depthwise_kernel:0', 'block6f_bn/gamma:0', 'block6f_bn/beta:0', 'block6f_se_reduce/kernel:0', 'block6f_se_reduce/bias:0', 'block6f_se_expand/kernel:0', 'block6f_se_expand/bias:0', 'block6f_project_conv/kernel:0', 'block6f_project_bn/gamma:0', 'block6f_project_bn/beta:0', 'block7a_expand_conv/kernel:0', 'block7a_expand_bn/gamma:0', 'block7a_expand_bn/beta:0', 'block7a_dwconv/depthwise_kernel:0', 'block7a_bn/gamma:0', 'block7a_bn/beta:0', 'block7a_se_reduce/kernel:0', 'block7a_se_reduce/bias:0', 'block7a_se_expand/kernel:0', 'block7a_se_expand/bias:0', 'block7a_project_conv/kernel:0', 'block7a_project_bn/gamma:0', 'block7a_project_bn/beta:0', 'block7b_expand_conv/kernel:0', 'block7b_expand_bn/gamma:0', 'block7b_expand_bn/beta:0', 'block7b_dwconv/depthwise_kernel:0', 'block7b_bn/gamma:0', 'block7b_bn/beta:0', 'block7b_se_reduce/kernel:0', 'block7b_se_reduce/bias:0', 'block7b_se_expand/kernel:0', 'block7b_se_expand/bias:0', 'block7b_project_conv/kernel:0', 'block7b_project_bn/gamma:0', 'block7b_project_bn/beta:0', 'top_conv/kernel:0', 'top_bn/gamma:0', 'top_bn/beta:0', 'decoder_stage0a_conv/kernel:0', 'decoder_stage0a_bn/gamma:0', 'decoder_stage0a_bn/beta:0', 'decoder_stage0b_conv/kernel:0', 'decoder_stage0b_bn/gamma:0', 'decoder_stage0b_bn/beta:0', 'decoder_stage1a_conv/kernel:0', 'decoder_stage1a_bn/gamma:0', 'decoder_stage1a_bn/beta:0', 'decoder_stage1b_conv/kernel:0', 'decoder_stage1b_bn/gamma:0', 'decoder_stage1b_bn/beta:0', 'decoder_stage2a_conv/kernel:0', 'decoder_stage2a_bn/gamma:0', 'decoder_stage2a_bn/beta:0', 'decoder_stage2b_conv/kernel:0', 'decoder_stage2b_bn/gamma:0', 'decoder_stage2b_bn/beta:0', 'decoder_stage3a_conv/kernel:0', 'decoder_stage3a_bn/gamma:0', 'decoder_stage3a_bn/beta:0', 'decoder_stage3b_conv/kernel:0', 'decoder_stage3b_bn/gamma:0', 'decoder_stage3b_bn/beta:0', 'decoder_stage4a_conv/kernel:0', 'decoder_stage4a_bn/gamma:0', 'decoder_stage4a_bn/beta:0', 'decoder_stage4b_conv/kernel:0', 'decoder_stage4b_bn/gamma:0', 'decoder_stage4b_bn/beta:0', 'final_conv/kernel:0', 'final_conv/bias:0'].
pablosarricolea commented 3 years ago

change the output of the data loader into a tuple instead of a list, it worked for me

https://github.com/tensorflow/tensorflow/issues/38233

AnamKhurshid17 commented 3 years ago

How I change data loader output into a tuple?

secg95-zz commented 3 years ago

Same problem with the multi class example, no back prop at all. Buggy tutorials...

pablosarricolea commented 3 years ago

sorry for the late reply, in the Dataloader class, the __getitem__ function I returned: (batch[0],batch[1])