andrusza2 commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[Y ] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[Y ] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[Y ] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

I am trying to train the model using Tensorflow 2. The pipeline seems to work (training starts and the training process seems to be running), but I noticed a disturbing symptom - in Tensorboard the preview of images is incorrect - they look like they are badly decoded (color values truncated to 0 and 1 - example below).

From what I remember - when I was using Object Detection API with TF1, the preview displayed "normal" images. I am not sure if this is a bug related to Tesnsorboard visualization only or if the training pipeline does not work as it should and the images are loaded incorrectly. Or maybe I am making a configuration mistake?

3. Steps to reproduce

Generate tfrecords for the PASCAL VOC file according to the instructions in the https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/preparing_inputs.md (to make sure that the data is not the problem).
Create a config based on any of the available configs, e.g. faster_rcnn_resnet50_v1_fpn_640x640_coco17_tpu-8.config (changenum_classes, use_bfloat16, fine_tune_checkpoint and paths to generated tfrecords and appropriate label_map in input_readers)

Start local training

# From the tensorflow/models/research/ directory
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
python object_detection/model_main_tf2.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--alsologtostderr

Launch Tensorboard
```
tensorboard --logdir=${MODEL_DIR}
```

4. Expected behavior

I expect to see properly decoded images.

5. Additional context

Suspect part of the training log (but I'm not sure if it has anything to do with the issue):

WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0816 20:24:48.906076 140309352765248 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.957060 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.960490 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.966012 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.
WARNING:tensorflow:The operation `tf.image.convert_image_dtype` will be skipped since the input and output dtypes are identical.
W0816 20:24:48.971490 140309352765248 image_ops_impl.py:2018] The operation `tf.image.convert_image_dtype` will be skippedsince the input and output dtypes are identical.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.4 (Nvidia NGC Tensorflow 20.06 docker image)
Mobile device name if the issue happens on a mobile device: NA
TensorFlow installed from (source or binary): source (Nvidia NGC Tensorflow 20.06 docker image)
TensorFlow version (use command below): 2.2.0
Python version: 3.6
Bazel version (if compiling from source): NA (Nvidia NGC Tensorflow 20.06 docker image)
GCC/Compiler version (if compiling from source): NA (Nvidia NGC Tensorflow 20.06 docker image)
CUDA/cuDNN version: CUDA 11.0 / cuDNN 8.0.1.13
GPU model and memory: GeForce GTX1080 8GB

Jconn commented 3 years ago

this is happening with my data as well

I had been successfully training on the TF1 pipeline for the last 8 months with these tfrecords. When I evaluated the TF2 pipeline to see if it was ready, I get extremely low loss in my training session, but terrible performance.

Moritz-Weisenboehler commented 3 years ago

This seems to be a problem of visualization and not a training issue. I had a similar problem due to a different image data scaling: range (-1, 1) instead of (0, 1) as required by tf.summary.image.

I suggested a possible solution in #9019.

Alloooshe commented 3 years ago

Hi, can you guys help me with steps to train deeplab on a custom dataset using tf2, just the basic outline, I'm still a beginner and I don't know where to start ! sorry my comment is not to answer the question but I would really appreciate the help @Jconn @andrusza2

LackesLab commented 3 years ago

Like @Moritz-Weisenboehler already mentioned, this is due the fact, that for tensorboard visualization, the image needs to be scaled into 0 to 1 values. This should solve your problem.

mackdelany commented 3 years ago

@Moritz-Weisenboehler @CptK1ng sorry to bother... when do you scale your images into 0 to 1 values? I assume you are writing images into TFRecord files as jpg byte streams?

Are you using the NormalizeImage data augmentation setting? Or something a bit more explicit?

Thanks in advance :)

LackesLab commented 3 years ago

@mackdelany There are 2 possible steps in your program where normalization could be applied:

Generating and preparing your train and val data. For example if you read images from disk into numpy arrays, you can directly apply a normalization.
In your training pipeline. If you use tf keras, you can normalize your inputs after the "Input" declaration.

A simple and yet effective way of scaling images into [0,1]could be Rescaling the the RGB values by a factor of 1/255 with https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Rescaling?hl=de

rescaling = Rescaling(scale=1.0 / 255) (input)

Jconn commented 3 years ago

@CptK1ng is this requirement new to tensorflow 2? I have had no problems with visualization of my tfrecord files using tensorboard/tf 1.14 and 1.15.

Edit: My image channels are being scaled from 0 to 1 before they're written into the tfrecord, there is another issue here.

LackesLab commented 3 years ago

@Jconn To the best of my knowledge, this restriction was already present in tf 1.x. I see no issue with your scaling.

randolfo75 commented 3 years ago

I'm facing the same problem. I think it isn't a specific training data scaling issue because the eval images are displaying and evaluating fine and my tfrecords images aren't scaled. I used the same tfrecord in tf1 version too, but I can't say the problem was the same because in tf1 version my training images wasn't shown anyway.

michielva commented 3 years ago

Is there already a definite solution for this problem?

I have tried the fix suggested by @Moritz-Weisenboehler (thanks for this btw!), but it did not fix the oversaturated visualizations for my case in Tensorboard. It does have an effect on the image, but does not result in the correct image.

Printing the input image tensor shows that, for me, the values are not in [-1,1] but ranging from -2.11 to 2.64 for a particular image. Is there some kind of normalization (based on mean or something) augmentation happening even though all augmentation settings were taken out of the pipeline.config file?

Example of image tensor [Example of tf.print of the image tensor]

I noticed that my original image (_features[fields.InputDataFields.originalimage]) is shown correctly if shown in Tensorboard. The image tensor of this original image is a uint8 tensor (0..255).

model_lib_v2.py : example of writing input and original image to Tensorboard

  # input image (used for training) --> images showing oversaturated, even though augmentation was disabled
  tf.compat.v2.summary.image(
     name='test_train',
     step=global_step,
     data=(features[fields.InputDataFields.image][:num_visualizations] + 1) / 2,        # tried fix, but does not give correct image
     max_outputs=num_visualizations)

  # original images --> showing correctly (uint8 tensor)
  tf.compat.v2.summary.image(
      name='original_images',
      step=global_step,
      data=(features[fields.InputDataFields.original_image]),
      max_outputs=3
  )

Some extra information on my implementation:

Using the TF2 Object Detection API
All augmentation configs were deleted from the pipeline.config file
Images are not normalized before adding them to TFRecords (unless this done by the Tensorflow OD API automatically)

Did anybody find a fix for this or knows what's causing this exactly? Thanks in advance!

LaraNeves commented 3 years ago

I'm facing the same problem where the training images on Tensorboard look like:

While the images from the validation set (which was built exactly like the training set) look as they should during the evaluation steps.

Note that in both datasets I crop the images to 1000x1000 patches because they are too large to be used directly in the model.

Additionally, all the TF .record files where built according to the instructions in the Object Detections API tutorial and if I extract the images from them, they look as expected.

I'm using:

Latest faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config from the Object Detection API Model Zoo.
TensorFlow 2.2, python 3.8, CUDA 10.1, CuDNN 7.6.5
No data augmentation in the .config file except for random_horizontal_flip
I've tried using both PIL and OpenCV to open the images in case the color channels where switched but it made no difference
I've tried to normalize the images beforehand and again it did not solve the problem.
I've also added my current .config parameters below in case any of the parameters may be causing the issue.

Finally, changing this code:

tf.compat.v2.summary.image(
      name='train_input_images',
      step=global_step,
      data=(features[fields.InputDataFields.image]+256/2)/256, # This fixes the displaying of images on tensorboard
      max_outputs=3)

in the function eager_train_step fixes the displaying of the images, however it slows down the training steps and it's not ideal.

This because the performance of my model is lower than expected compared to an older version of the model with similar parameters and the same dataset but on TensorFlow 1.12, so I'm concerned that the model is training on these images and affecting the accuracy instead of just being an issue of visualization.

If anyone found the cause of the problem or has any fixes please share, it would be tremendously helpful.

Thank you in advance.

model {
  faster_rcnn {
    num_classes: 8 
    image_resizer {
      fixed_shape_resizer {
        width: 1000 
        height: 1000 
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet50_keras'
      batch_norm_trainable: true
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
        share_box_across_classes: true
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
    use_static_shapes: true
    use_matmul_crop_and_resize: true
    clip_anchors_to_image: true
    use_static_balanced_label_sampler: true
    use_matmul_gather_in_matcher: true
  }
}

train_config: {
  batch_size: 1 
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 1500000 
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.000300000014249
          schedule {
            step: 900000 
            learning_rate: 2.99999992421e-05
          }
          schedule {
            step: 1200000
            learning_rate: 3.00000010611e-06
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "models/pre_trained_checkpoints/ckpt-0" 
  fine_tune_checkpoint_type: "detection" 
  data_augmentation_options {
    random_horizontal_flip {
    }
  }

  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  use_bfloat16: false 
}
train_input_reader: {
  label_map_path: "configs/label_map.pbtxt" 
  tf_record_input_reader {
    input_path: "data/train_cropped_images.record" 
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "configs/label_map.pbtxt" 
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "data/val_cropped_images.record" 
  }
}

michielva commented 3 years ago

Hi @LaraNeves,

Maybe this is not really a big help, but I also searched a bit deeper in the tensorflow code and tried to do some debugging. Some things I noticed:

Adding retain_original_images: true in train_config in your pipeline.config gives you the possibility to show the original image (as it is in the tfrecord). To view it in TensorBoard, I had to add the following piece of code in model_lib_v2.py :
```
# original images
tf.compat.v2.summary.image(
    name='original_images',
    step=global_step,
    data=(features[fields.InputDataFields.original_image]),
    max_outputs=3
)
```
What I noticed is that the TF Object Detection API executes some normalization "automatically" on the input training images. I found out by writing parts of the actual tensors of my original_image and the images used for training. On the original images there is, as expected, no normalization (uint8: 0..255). On the training images the tensor had values between -2.11 and 2.45 (different for each image). So I'm not sure what normalization was done and how/if I can change it...

My question (and concern) is also really if this is a problem for the training of my model or if we are just talking about a visualization issue (TFBoard expecting 0..255 or 0..1 and getting different type of values).

LaraNeves commented 3 years ago

@michielva thank you for the update, it's helpful to discuss this issue and know that I'm not alone in this.

So I've also been digging around the Object Detection API code and here's what I found:

I've extracted the input images during the training loop (train_loop() of the model_lib_v2.py file) both on tensorboard and as .png file. In both, the issue was still visible, which is why, like you, I'm concerned that the training is being affected given that the training loop itself is using distorted images. Here's an example of one .png:

Note that again, the validation images are not showing this problem even in the eval_loop

The "normalization" of the image from [0,255] to negative values happen in the transform_input_data() function of the inputs.py file in this line:

preprocessed_resized_image, true_image_shape = model_preprocess_fn(tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))

So the function model_preprocess_fn() is responsible for the distortion of the images. This may be intentional in order to convert the input into a format that is, so to say, "palatable" to the neural network. If anyone knows for sure if this is the case, I would be very grateful for more insight on this matter.

Finally, as a last ditch effort and something that I really don't recommend for the above reasons, the model_preprocess_fn line was removed and the preprocessed_resized_image and true_image_shape were defined as the original images (with values [0,255]) but converted to the format that they would have been, had they passed through the model_preprocess_fn. And then the model was trained, with the same dataset.

The results are quite inconclusive, before the change this was the mAP:

After removing the preprocessing:

It's hard to conclude whether the distorted images were negatively affecting the metrics because removing that preprocessing didn't improve them that much except in the case of the mAP (small), not to mention that I don't know if that normalization is part of a crucial step of the training of the NN, so I would hesitate to say that this is a fix in any way.

This is what I have so far. I'll be training the same dataset on the Object Detection API for Tensorflow 1.12 which from what I know it works well, and if I get some more insights on why 2.2 has this issue but 1.12 doesn't, I'll be sure to share them here.

ophirSarusi commented 3 years ago

According to the documentation:

tf.image.convert_image_dtype( image, dtype, saturate=False, name=None)

"Note that converting from floating point inputs to integer types may lead to over/underflow problems. Set saturate to True to avoid such problem in problematic conversions. If enabled, saturation will clip the output into the allowed range before performing a potentially dangerous cast (and only before performing such a cast, i.e., when casting from a floating point to an integer type, and when casting from a signed to an unsigned type; saturate has no effect on casts between floats, or on casts that increase the type's range)."

Have you tried setting saturate=True?

LaraNeves commented 3 years ago

@ophirSarusi Thank you for your suggestion, I did try it however correct me if I'm wrong but I think that those kinds of conversions are only happening in the data augmentation in the preprocessor.py in the _rgb_to_grayscale() which I have disabled for the time being, so it hasn't solved the issue.

Again, thank you for the help and if you have any other ideas please let me know.

michielva commented 3 years ago

@LaraNeves Thank you for your extensive answer and the suggested workaround.

I have not tried it yet, but I'm a bit hesitant for two reasons: 1) I assume that there's a reason they chose to apply this kind of transformation. As you say "to convert the input to a preferred format for the NN". 2) My last models perform really well actually. This gave me some reassurance that the inputs (as we see them in Tensorboard) are not negatively affecting the training. There are some images who are almost pitch black, where my object detection model would not be working (if this were really the input that was used)... but it does work...

However, it's indeed an annoying issue. I would love to have an explanation, only because I already spent plenty of hours investigating this issue.

LaraNeves commented 3 years ago

So with the exact same dataset:

generated with TF 2.2 according to the object detection api tutorials
where my original images were cropped into 1000x1000 cropped images around the labeled objects
train-validation-test split of 75%-15%-10% respectively

And with the .config files as similar as possible:

very little hyperparameter tuning for now
only "random_horizontal_flip" data augmentation

I trained a faster_rcnn_resnet50_v1_1024x1024_coco17_tpu in the latest object detection API for TensorFlow 2.2 and an older object detection API for TensorFlow 1.12 for about the same steps ~120k. The evaluation is being done on a validation dataset.

The results are significantly different:

In version 1.12:

In version 2.2:

As you can see the version 2.2 is incapable of detecting small objects, no matter how long I train it for. The version 1.12 has better results comparatively.

Additionally, in each version I extracted the preprocessed image in what I think is the equivalent place for both training pipelines (not absolutely 100% sure here, so if anyone has any corrections/suggestions regarding this I would happily accept them).

For 1.12 in model_lib.py:

Results in the image:

For 2.2 in model_lib_v2.py:

Results in the image:

It could be due to a change in the tf.summary.image() that exports images differently, it could be my mistake and these images are not comparable in both pipelines, but it could also be a change in the preprocessing part of the training pipeline between one and the other that is negatively affecting the results. If anyone has any opinion or suggestion on this please share, I would really appreciate it.

@michielva I understand if you would rather not share and obviously our respective problems are different, but I'm really curious to know which specific model you are using and the kind of results you are getting (highest mAP on a validation/test set, for example would be enough) just as a comparison measure. If you don't mind sharing of course.

michielva commented 3 years ago

Hi @LaraNeves,

Sorry it took a while for me to respond, I was working on other cases. I am using EfficientDet-D1 model with pretrained weights on the COCO dataset. The mAP's I am getting are quite high on my last test set. mAP@.50IoU : 0.997 mAP@.50;.05;.95IoU : 0.545 So the model was able to find almost all objects, however the bounding boxes were far from pixel perfect.

But I do need to mention that it's rather expected that we're getting quite high mAP's in our case. I'm working with images from our production lines made by our Machine Vision cameras. On our product there are marks (circles, rectangles, lines, crosses, ...). These are the objects we wish to find. On all images these objects are available, although sometimes quite hard to see.

Due to pending patents on our products I cannot share any images of my dataset. But I can give some extra information: My images are 1600x1280 and the marks I am trying to find are between 30x30 up until 100x100 pixels. The EfficientDet-D1 model rescales my images to 640x640 however. I do some data augmentation (random 90° rotation and random gamma correction).

So in the end, I am quite sure that this issue does not affect my training. As in some of the "oversaturated" input images in Tensorboard it is impossible to see the marks I wish to find. And the model does find them perfectly, so that makes me think it's indeed only a "visualisation issue" in Tensorboard.

jartantupjar commented 3 years ago

For anyone else having this issue, this is indeed a TF2 object detection api to tensorboard issue that can be solved by doing the following: in model_lib_v2.py, line 276 you need to rescale images from range (-1, 1) to (0, 1) data=features[fields.InputDataFields.image] to data=(features[fields.InputDataFields.image] + 1) / 2, like:

    tf.compat.v2.summary.image(
        name='train_input_images',
        step=global_step,
        data=(features[fields.InputDataFields.image]  + 1) / 2,
        max_outputs=3)

michielva commented 3 years ago

For anyone else having this issue, this is indeed a TF2 object detection api to tensorboard issue that can be solved by doing the following: in model_lib_v2.py, line 276 you need to rescale images from range (-1, 1) to (0, 1) data=features[fields.InputDataFields.image] to data=(features[fields.InputDataFields.image] + 1) / 2, like:
    tf.compat.v2.summary.image(
        name='train_input_images',
        step=global_step,
        data=(features[fields.InputDataFields.image]  + 1) / 2,
        max_outputs=3)

@jartantupjar Thanks for the suggestion, however this is not working in my case. The values I have do not range from (-1,1) but from -2.11 to 2.45 for example (depends a bit on the image).

jartantupjar commented 3 years ago

@michielva then replace the line with something like this. Because instead of scaling from -1,1. You need to get the min pixel value and max pixel value of your image. While I have not tested this, this should work (you may need to import numpy)

data= (features[fields.InputDataFields.image]-np.min(features[fields.InputDataFields.image]))/(np.max(features[fields.InputDataFields.image])-np.min(features[fields.InputDataFields.image]))

As this is essentially minmax normalization

VeeranjaneyuluToka commented 3 years ago

Can not we use fields.InputDataFields.original_image rather than reverting back?

jartantupjar commented 3 years ago

@VeeranjaneyuluToka I dont think so. The only thing original_image does is keep it without any preprocessing steps (eg. image_resize). You can give it a shot and get back to us with your results.

noahshpak commented 3 years ago

following up on the suggestion from @jartantupjar this did the trick for me in object_detection/model_lib_v2.py on line 622

if record_summaries:
    imgs = features[fields.InputDataFields.image][:3] 
    imgs = tf.div(tf.subtract(imgs, tf.reduce_min(imgs)), tf.subtract(tf.reduce_max(imgs), tf.reduce_min(imgs)))
    tf.compat.v2.summary.image(name='train_input_images', step=global_step, data=imgs, max_outputs=3)

Moritz-Weisenboehler commented 3 years ago

I updated my pull request #9019 today as it has been pending for almost a year.

In the update, I added @noahshpak more general solution for image rescaling.

berniecamus commented 1 year ago

Almost 3 years now in fact @Moritz-Weisenboehler, having checked it. I'm going to incorporate it into my personal copy though. Thanks!

MSathishkumar1990 commented 4 months ago

I want to save the preprocessed images as.png images in a separate folder and also display in the tensorboard. First, I tried to visualize in tensorboard. But, it shows 5-10 images only. Please give your suggestions or codes to solve this.

Moreover, I tried to save the original image but it doesn't work. I am using TF2 object detection. Thank you.

tensorflow / models

Problem with image decoding in Tensorflow 2 #9115

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information