Open andrusza2 opened 3 years ago
this is happening with my data as well
I had been successfully training on the TF1 pipeline for the last 8 months with these tfrecords. When I evaluated the TF2 pipeline to see if it was ready, I get extremely low loss in my training session, but terrible performance.
This seems to be a problem of visualization and not a training issue. I had a similar problem due to a different image data scaling: range (-1, 1) instead of (0, 1) as required by tf.summary.image.
I suggested a possible solution in #9019.
Hi, can you guys help me with steps to train deeplab on a custom dataset using tf2, just the basic outline, I'm still a beginner and I don't know where to start ! sorry my comment is not to answer the question but I would really appreciate the help @Jconn @andrusza2
Like @Moritz-Weisenboehler already mentioned, this is due the fact, that for tensorboard visualization, the image needs to be scaled into 0 to 1 values. This should solve your problem.
@Moritz-Weisenboehler @CptK1ng sorry to bother... when do you scale your images into 0 to 1 values? I assume you are writing images into TFRecord files as jpg byte streams?
Are you using the NormalizeImage data augmentation setting? Or something a bit more explicit?
Thanks in advance :)
@mackdelany There are 2 possible steps in your program where normalization could be applied:
Generating and preparing your train and val data. For example if you read images from disk into numpy arrays, you can directly apply a normalization.
In your training pipeline. If you use tf keras, you can normalize your inputs after the "Input" declaration.
A simple and yet effective way of scaling images into [0,1]could be Rescaling the the RGB values by a factor of 1/255 with https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/Rescaling?hl=de
rescaling = Rescaling(scale=1.0 / 255) (input)
@CptK1ng is this requirement new to tensorflow 2? I have had no problems with visualization of my tfrecord files using tensorboard/tf 1.14 and 1.15.
Edit: My image channels are being scaled from 0 to 1 before they're written into the tfrecord, there is another issue here.
@Jconn To the best of my knowledge, this restriction was already present in tf 1.x. I see no issue with your scaling.
I'm facing the same problem. I think it isn't a specific training data scaling issue because the eval images are displaying and evaluating fine and my tfrecords images aren't scaled. I used the same tfrecord in tf1 version too, but I can't say the problem was the same because in tf1 version my training images wasn't shown anyway.
Is there already a definite solution for this problem?
I have tried the fix suggested by @Moritz-Weisenboehler (thanks for this btw!), but it did not fix the oversaturated visualizations for my case in Tensorboard. It does have an effect on the image, but does not result in the correct image.
Printing the input image tensor shows that, for me, the values are not in [-1,1] but ranging from -2.11 to 2.64 for a particular image. Is there some kind of normalization (based on mean or something) augmentation happening even though all augmentation settings were taken out of the pipeline.config file?
[Example of tf.print of the image tensor]
I noticed that my original image (_features[fields.InputDataFields.originalimage]) is shown correctly if shown in Tensorboard. The image tensor of this original image is a uint8 tensor (0..255).
model_lib_v2.py : example of writing input and original image to Tensorboard
# input image (used for training) --> images showing oversaturated, even though augmentation was disabled
tf.compat.v2.summary.image(
name='test_train',
step=global_step,
data=(features[fields.InputDataFields.image][:num_visualizations] + 1) / 2, # tried fix, but does not give correct image
max_outputs=num_visualizations)
# original images --> showing correctly (uint8 tensor)
tf.compat.v2.summary.image(
name='original_images',
step=global_step,
data=(features[fields.InputDataFields.original_image]),
max_outputs=3
)
Some extra information on my implementation:
Did anybody find a fix for this or knows what's causing this exactly? Thanks in advance!
I'm facing the same problem where the training images on Tensorboard look like:
While the images from the validation set (which was built exactly like the training set) look as they should during the evaluation steps.
Note that in both datasets I crop the images to 1000x1000 patches because they are too large to be used directly in the model.
Additionally, all the TF .record files where built according to the instructions in the Object Detections API tutorial and if I extract the images from them, they look as expected.
I'm using:
faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config
from the Object Detection API Model Zoo.random_horizontal_flip
Finally, changing this code:
tf.compat.v2.summary.image(
name='train_input_images',
step=global_step,
data=(features[fields.InputDataFields.image]+256/2)/256, # This fixes the displaying of images on tensorboard
max_outputs=3)
in the function eager_train_step
fixes the displaying of the images, however it slows down the training steps and it's not ideal.
This because the performance of my model is lower than expected compared to an older version of the model with similar parameters and the same dataset but on TensorFlow 1.12, so I'm concerned that the model is training on these images and affecting the accuracy instead of just being an issue of visualization.
If anyone found the cause of the problem or has any fixes please share, it would be tremendously helpful.
Thank you in advance.
model {
faster_rcnn {
num_classes: 8
image_resizer {
fixed_shape_resizer {
width: 1000
height: 1000
}
}
feature_extractor {
type: 'faster_rcnn_resnet50_keras'
batch_norm_trainable: true
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
share_box_across_classes: true
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
use_static_shapes: true
use_matmul_crop_and_resize: true
clip_anchors_to_image: true
use_static_balanced_label_sampler: true
use_matmul_gather_in_matcher: true
}
}
train_config: {
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 1500000
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.000300000014249
schedule {
step: 900000
learning_rate: 2.99999992421e-05
}
schedule {
step: 1200000
learning_rate: 3.00000010611e-06
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
fine_tune_checkpoint_version: V2
fine_tune_checkpoint: "models/pre_trained_checkpoints/ckpt-0"
fine_tune_checkpoint_type: "detection"
data_augmentation_options {
random_horizontal_flip {
}
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
use_bfloat16: false
}
train_input_reader: {
label_map_path: "configs/label_map.pbtxt"
tf_record_input_reader {
input_path: "data/train_cropped_images.record"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1;
}
eval_input_reader: {
label_map_path: "configs/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "data/val_cropped_images.record"
}
}
Hi @LaraNeves,
Maybe this is not really a big help, but I also searched a bit deeper in the tensorflow code and tried to do some debugging. Some things I noticed:
Adding retain_original_images: true
in train_config in your pipeline.config
gives you the possibility to show the original image (as it is in the tfrecord). To view it in TensorBoard, I had to add the following piece of code in model_lib_v2.py
:
# original images
tf.compat.v2.summary.image(
name='original_images',
step=global_step,
data=(features[fields.InputDataFields.original_image]),
max_outputs=3
)
What I noticed is that the TF Object Detection API executes some normalization "automatically" on the input training images. I found out by writing parts of the actual tensors of my original_image and the images used for training. On the original images there is, as expected, no normalization (uint8
: 0..255). On the training images the tensor had values between -2.11 and 2.45 (different for each image). So I'm not sure what normalization was done and how/if I can change it...
My question (and concern) is also really if this is a problem for the training of my model or if we are just talking about a visualization issue (TFBoard expecting 0..255 or 0..1 and getting different type of values).
@michielva thank you for the update, it's helpful to discuss this issue and know that I'm not alone in this.
So I've also been digging around the Object Detection API code and here's what I found:
train_loop()
of the model_lib_v2.py
file) both on tensorboard and as .png file. In both, the issue was still visible, which is why, like you, I'm concerned that the training is being affected given that the training loop itself is using distorted images. Here's an example of one .png:Note that again, the validation images are not showing this problem even in the eval_loop
transform_input_data()
function of the inputs.py
file in this line:preprocessed_resized_image, true_image_shape = model_preprocess_fn(tf.expand_dims(tf.cast(image, dtype=tf.float32), axis=0))
So the function model_preprocess_fn()
is responsible for the distortion of the images. This may be intentional in order to convert the input into a format that is, so to say, "palatable" to the neural network. If anyone knows for sure if this is the case, I would be very grateful for more insight on this matter.
Finally, as a last ditch effort and something that I really don't recommend for the above reasons, the model_preprocess_fn
line was removed and the preprocessed_resized_image
and true_image_shape
were defined as the original images (with values [0,255]) but converted to the format that they would have been, had they passed through the model_preprocess_fn
.
And then the model was trained, with the same dataset.
The results are quite inconclusive, before the change this was the mAP:
After removing the preprocessing:
It's hard to conclude whether the distorted images were negatively affecting the metrics because removing that preprocessing didn't improve them that much except in the case of the mAP (small), not to mention that I don't know if that normalization is part of a crucial step of the training of the NN, so I would hesitate to say that this is a fix in any way.
This is what I have so far. I'll be training the same dataset on the Object Detection API for Tensorflow 1.12 which from what I know it works well, and if I get some more insights on why 2.2 has this issue but 1.12 doesn't, I'll be sure to share them here.
According to the documentation:
tf.image.convert_image_dtype( image, dtype, saturate=False, name=None)
"Note that converting from floating point inputs to integer types may lead to over/underflow problems. Set saturate to True to avoid such problem in problematic conversions. If enabled, saturation will clip the output into the allowed range before performing a potentially dangerous cast (and only before performing such a cast, i.e., when casting from a floating point to an integer type, and when casting from a signed to an unsigned type; saturate has no effect on casts between floats, or on casts that increase the type's range)."
Have you tried setting saturate=True?
@ophirSarusi Thank you for your suggestion, I did try it however correct me if I'm wrong but I think that those kinds of conversions are only happening in the data augmentation in the preprocessor.py
in the _rgb_to_grayscale()
which I have disabled for the time being, so it hasn't solved the issue.
Again, thank you for the help and if you have any other ideas please let me know.
@LaraNeves Thank you for your extensive answer and the suggested workaround.
I have not tried it yet, but I'm a bit hesitant for two reasons: 1) I assume that there's a reason they chose to apply this kind of transformation. As you say "to convert the input to a preferred format for the NN". 2) My last models perform really well actually. This gave me some reassurance that the inputs (as we see them in Tensorboard) are not negatively affecting the training. There are some images who are almost pitch black, where my object detection model would not be working (if this were really the input that was used)... but it does work...
However, it's indeed an annoying issue. I would love to have an explanation, only because I already spent plenty of hours investigating this issue.
So with the exact same dataset:
And with the .config files as similar as possible:
I trained a faster_rcnn_resnet50_v1_1024x1024_coco17_tpu
in the latest object detection API for TensorFlow 2.2 and an older object detection API for TensorFlow 1.12 for about the same steps ~120k. The evaluation is being done on a validation dataset.
The results are significantly different:
In version 1.12:
In version 2.2:
As you can see the version 2.2 is incapable of detecting small objects, no matter how long I train it for. The version 1.12 has better results comparatively.
Additionally, in each version I extracted the preprocessed image in what I think is the equivalent place for both training pipelines (not absolutely 100% sure here, so if anyone has any corrections/suggestions regarding this I would happily accept them).
For 1.12 in model_lib.py
:
Results in the image:
For 2.2 in model_lib_v2.py
:
Results in the image:
It could be due to a change in the tf.summary.image()
that exports images differently, it could be my mistake and these images are not comparable in both pipelines, but it could also be a change in the preprocessing part of the training pipeline between one and the other that is negatively affecting the results. If anyone has any opinion or suggestion on this please share, I would really appreciate it.
@michielva I understand if you would rather not share and obviously our respective problems are different, but I'm really curious to know which specific model you are using and the kind of results you are getting (highest mAP on a validation/test set, for example would be enough) just as a comparison measure. If you don't mind sharing of course.
Hi @LaraNeves,
Sorry it took a while for me to respond, I was working on other cases.
I am using EfficientDet-D1 model with pretrained weights on the COCO dataset.
The mAP's I am getting are quite high on my last test set.
mAP@.50IoU
: 0.997
mAP@.50;.05;.95IoU
: 0.545
So the model was able to find almost all objects, however the bounding boxes were far from pixel perfect.
But I do need to mention that it's rather expected that we're getting quite high mAP's in our case. I'm working with images from our production lines made by our Machine Vision cameras. On our product there are marks (circles, rectangles, lines, crosses, ...). These are the objects we wish to find. On all images these objects are available, although sometimes quite hard to see.
Due to pending patents on our products I cannot share any images of my dataset. But I can give some extra information: My images are 1600x1280 and the marks I am trying to find are between 30x30 up until 100x100 pixels. The EfficientDet-D1 model rescales my images to 640x640 however. I do some data augmentation (random 90° rotation and random gamma correction).
So in the end, I am quite sure that this issue does not affect my training. As in some of the "oversaturated" input images in Tensorboard it is impossible to see the marks I wish to find. And the model does find them perfectly, so that makes me think it's indeed only a "visualisation issue" in Tensorboard.
For anyone else having this issue, this is indeed a TF2 object detection api to tensorboard issue that can be solved by doing the following:
in model_lib_v2.py, line 276 you need to rescale images from range (-1, 1) to (0, 1)
data=features[fields.InputDataFields.image]
to
data=(features[fields.InputDataFields.image] + 1) / 2,
like:
tf.compat.v2.summary.image(
name='train_input_images',
step=global_step,
data=(features[fields.InputDataFields.image] + 1) / 2,
max_outputs=3)
For anyone else having this issue, this is indeed a TF2 object detection api to tensorboard issue that can be solved by doing the following: in model_lib_v2.py, line 276 you need to rescale images from range (-1, 1) to (0, 1)
data=features[fields.InputDataFields.image]
todata=(features[fields.InputDataFields.image] + 1) / 2,
like:tf.compat.v2.summary.image( name='train_input_images', step=global_step, data=(features[fields.InputDataFields.image] + 1) / 2, max_outputs=3)
@jartantupjar Thanks for the suggestion, however this is not working in my case. The values I have do not range from (-1,1) but from -2.11 to 2.45 for example (depends a bit on the image).
@michielva then replace the line with something like this. Because instead of scaling from -1,1. You need to get the min pixel value and max pixel value of your image. While I have not tested this, this should work (you may need to import numpy)
data= (features[fields.InputDataFields.image]-np.min(features[fields.InputDataFields.image]))/(np.max(features[fields.InputDataFields.image])-np.min(features[fields.InputDataFields.image]))
As this is essentially minmax normalization
Can not we use fields.InputDataFields.original_image rather than reverting back?
@VeeranjaneyuluToka I dont think so. The only thing original_image does is keep it without any preprocessing steps (eg. image_resize). You can give it a shot and get back to us with your results.
following up on the suggestion from @jartantupjar this did the trick for me in object_detection/model_lib_v2.py
on line 622
if record_summaries:
imgs = features[fields.InputDataFields.image][:3]
imgs = tf.div(tf.subtract(imgs, tf.reduce_min(imgs)), tf.subtract(tf.reduce_max(imgs), tf.reduce_min(imgs)))
tf.compat.v2.summary.image(name='train_input_images', step=global_step, data=imgs, max_outputs=3)
I updated my pull request #9019 today as it has been pending for almost a year.
In the update, I added @noahshpak more general solution for image rescaling.
Almost 3 years now in fact @Moritz-Weisenboehler, having checked it. I'm going to incorporate it into my personal copy though. Thanks!
I want to save the preprocessed images as.png images in a separate folder and also display in the tensorboard. First, I tried to visualize in tensorboard. But, it shows 5-10 images only. Please give your suggestions or codes to solve this.
Moreover, I tried to save the original image but it doesn't work. I am using TF2 object detection. Thank you.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/tree/master/research/object_detection
2. Describe the bug
I am trying to train the model using Tensorflow 2. The pipeline seems to work (training starts and the training process seems to be running), but I noticed a disturbing symptom - in Tensorboard the preview of images is incorrect - they look like they are badly decoded (color values truncated to 0 and 1 - example below).![image](https://user-images.githubusercontent.com/15386832/90343565-09670100-e012-11ea-82b5-1c852c51a5e4.png)
From what I remember - when I was using Object Detection API with TF1, the preview displayed "normal" images. I am not sure if this is a bug related to Tesnsorboard visualization only or if the training pipeline does not work as it should and the images are loaded incorrectly. Or maybe I am making a configuration mistake?
3. Steps to reproduce
faster_rcnn_resnet50_v1_fpn_640x640_coco17_tpu-8.config
(changenum_classes
,use_bfloat16
,fine_tune_checkpoint
and paths to generated tfrecords and appropriate label_map in input_readers)4. Expected behavior
I expect to see properly decoded images.
5. Additional context
Suspect part of the training log (but I'm not sure if it has anything to do with the issue):
6. System information