mobilenet V2 train fail #8

Open fanweiya opened 3 years ago

fanweiya commented 3 years ago

i use mobilenet V2 backbone, but train fail

[-] Importing tensorflow...
2021-01-14 13:49:10.317068: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
[+] Done! Tensorflow version: 2.5.0-dev20201230
[-] Importing Deeplabv3plus Trainer class...
[-] Importing config files...
2021-01-14 13:49:11.537581: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-01-14 13:49:11.591072: E tensorflow/stream_executor/cuda/] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-01-14 13:49:11.591101: I tensorflow/stream_executor/cuda/] kernel driver does not appear to be running on this host (alit-PowerEdge-T640): /proc/driver/nvidia/version does not exist
2021-01-14 13:49:11.591383: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0,/job:localhost/replica:0/task:0/device:GPU:1
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0,/job:localhost/replica:0/task:0/device:GPU:1
Train Images are good to go
[+] Data points in train dataset: 6400
Train Dataset: <PrefetchDataset shapes: ((16, 512, 512, 3), (16, 512, 512, 1)), types: (tf.float32, tf.float32)>
Train Images are good to go
Data points in train dataset: 1600
Val Dataset: <PrefetchDataset shapes: ((16, 512, 512, 3), (16, 512, 512, 1)), types: (tf.float32, tf.float32)>
2021-01-14 13:49:12.045387: I tensorflow/core/profiler/lib/] Profiler session initializing.
2021-01-14 13:49:12.045414: I tensorflow/core/profiler/lib/] Profiler session started.
2021-01-14 13:49:12.100790: I tensorflow/core/profiler/lib/] Profiler session tear down.
2021-01-14 13:49:12.268507: W tensorflow/core/grappler/optimizers/data/] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_STRING
      type: DT_STRING
attr {
  key: "output_shapes"
  value {
    list {
      shape {
      shape {

2021-01-14 13:49:12.362496: I tensorflow/compiler/mlir/] None of the MLIR optimization passes are enabled (registered 2)
2021-01-14 13:49:12.367114: I tensorflow/core/platform/profile_utils/] CPU Frequency: 2300000000 Hz
Epoch 1/100
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
Traceback (most recent call last):
  File "", line 47, in <module>
    HISTORY = TRAINER.train()
  File "/data/deeplab/DeepLabV3-Plus/deeplabv3plus/", line 191, in train
    epochs=self.config['epochs'], callbacks=callbacks
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/wandb/integration/keras/", line 119, in new_v2
    return old_v2(*args, **kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/", line 1135, in fit
    tmp_logs = self.train_function(iterator)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 797, in __call__
    result = self._call(*args, **kwds)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 841, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 695, in _initialize
    *args, **kwds))
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 2998, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 3390, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 3235, in _create_graph_function
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/framework/", line 998, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/", line 603, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/framework/", line 985, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/ train_function  *
        return step_function(self, iterator)
    /data/deeplab/DeepLabV3-Plus/deeplabv3plus/model/ call  *
        tensor = tf.keras.layers.Concatenate(axis=-1)([input_a, input_b])
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/ __call__  **
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/ _maybe_build  # pylint:disable=not-callable
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/utils/ wrapper
        output_shape = fn(instance, input_shape)
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/layers/ build
        raise ValueError(err_msg)

    ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(8, 128, 128, 256), (8, 64, 64, 48)]
jeremy-cv commented 3 years ago

Hi, thanks for this implementation.

I'm having the same issue with Mobilenetv2 backbone model.

@fanweiya did you solve this ?


jeremy-cv commented 3 years ago

It seems that training runs with factor 8 in ._get_upsample_layer_fn(input_shape, factor=8)

diogosilva30 commented 2 years ago

+1 Same error here

shivarajkarki commented 1 year ago

Same error ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(1, 128, 128, 256), (1, 64, 64, 48)]

KozAAAAA commented 6 months ago

Has anyone managed to fix this?