sglvladi / TensorFlowObjectDetectionTutorial

A tutorial on object detection using TensorFlow
194 stars 128 forks source link

Tensorflow 2 Export Training Data Result is Not working #60

Closed lukeNguyen0202 closed 2 years ago

lukeNguyen0202 commented 4 years ago

I tried to train my custom Data to detect 2 objects

https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#configure-the-training-pipeline

Here is my Pre-train model : ssd_resnet50_v1_fpn_640x640_coco17_tpu

Here is my model pipefile:


model {
  ssd {
    num_classes: 2
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }
    feature_extractor {
      type: "ssd_resnet50_v1_fpn_keras"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 0.0004
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.03
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.997
          scale: true
          epsilon: 0.001
        }
      }
      override_base_feature_extractor_hyperparams: true
      fpn {
        min_level: 3
        max_level: 7
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 0.0004
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.01
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.997
            scale: true
            epsilon: 0.001
          }
        }
        depth: 256
        num_layers_before_predictor: 4
        kernel_size: 3
        class_prediction_bias_init: -4.6
      }
    }
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 2
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-08
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
        use_static_shapes: false
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 2
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.04
          total_steps: 25000
          warmup_learning_rate: 0.013333
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "/home/lnguyen/Apps/Tensorflow_2/workspace/pre-trained-models/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-25"
  num_steps: 25000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  use_bfloat16: false
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "/home/lnguyen/Apps/Tensorflow_2/workspace/training/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/home/lnguyen/Apps/Tensorflow_2/workspace/data/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "/home/lnguyen/Apps/Tensorflow_2/workspace/training/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/home/lnguyen/Apps/Tensorflow_2/workspace/data/train.record"
  }
}

I use model_main_tf2.py to start the training then I use exporter_main_v2.py to export the result

I got everything as expect, but I cant use the save_model.pb for the actual detection process.

in TF1, after the training, I have files like model.ckpt ????? - ????? model.ckpt.meta model.ckpt.index . .

but why in TF2 I got: ckpt-0.data ?????? - ????? ckpt-0.index

Please help

lukeNguyen0202 commented 4 years ago

I got this Error when I use the save_model.pb :+1:


2020-10-14 16:03:51.891769: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "detection.py", line 52, in <module>
    od_graph_def.ParseFromString(serialized_graph)
google.protobuf.message.DecodeError: Error parsing message
sglvladi commented 4 years ago

@lukeNguyen0202 did you follow the provided examples (see here)?

sglvladi commented 4 years ago

You can use above example to load the saved_model.pb. To do so, set PATH_TO_MODEL_DIR equal to the path to the folder that contains the exported files, e.g. C:/Users/sglvladi/Documents/Tensorflow/workspace/training_demo/exported-models/my_model

lukeNguyen0202 commented 4 years ago

@lukeNguyen0202 did you follow the provided examples (see here)?

I found this instruction confusing, because I already have my training data, why would I need to download an extra source of data for detection.

So I follow an different instruction (here) which is similar and kinda same with what I need to do for detection. It need to connect to the video stream via Opencv, then break down each frame for detection

lukeNguyen0202 commented 4 years ago

You can use above example to load the saved_model.pb. To do so, set PATH_TO_MODEL_DIR equal to the path to the folder that contains the exported files, e.g. C:/Users/sglvladi/Documents/Tensorflow/workspace/training_demo/exported-models/my_model

I have following what you refer on the load model part here is my code:

import time
from .models.research.object_detection.utils import label_map_util
from .models.research.object_detection.utils import visualization_utils as viz_utils
import tensorflow as tf

# Download and extract model
def download_model(model_name, model_date):
    base_url = 'http://download.tensorflow.org/models/object_detection/tf2/'
    model_file = model_name + '.tar.gz'
    model_dir = tf.keras.utils.get_file(fname=model_name,
                                        origin=base_url + model_date + '/' + model_file,
                                        untar=True)
    return str(model_dir)

MODEL_DATE = '20200711'
MODEL_NAME = 'centernet_hg104_1024x1024_coco17_tpu-32'

# PATH_TO_MODEL_DIR = download_model(MODEL_NAME, MODEL_DATE)
PATH_TO_MODEL_DIR = "/home/lnguyen/Apps/Tensorflow_2/workspace/exported_model_1"

PATH_TO_SAVED_MODEL = PATH_TO_MODEL_DIR + "/saved_model"

print('Loading model...', end='')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

Here is the result after run:

Traceback (most recent call last):
  File "load_model.py", line 2, in <module>
    from .models.research.object_detection.utils import label_map_util
ImportError: attempted relative import with no known parent package

It seem like complaining about import tensorflow.compat.v1 as tf under label_map_util do bypass this i have to disable TF2. Do you have any information about why TF2 cant do detection like TF1? but it have to be disable so that the detection work? or TF2 is not mead to be used for detection ?

sglvladi commented 4 years ago

@lukeNguyen0202 I have not included specific instructions on how to use your own data, since it should be straightforward to do so based on the provided examples.

Considering the example I linked above: There is not need to download the data. If you already have you own data, simply replace IMAGE_PATH with the path to the folder containing your data and comment out the downloading.

sglvladi commented 4 years ago

You can use above example to load the saved_model.pb. To do so, set PATH_TO_MODEL_DIR equal to the path to the folder that contains the exported files, e.g. C:/Users/sglvladi/Documents/Tensorflow/workspace/training_demo/exported-models/my_model

I have following what you refer on the load model part here is my code:

import time
from .models.research.object_detection.utils import label_map_util
from .models.research.object_detection.utils import visualization_utils as viz_utils
import tensorflow as tf

# Download and extract model
def download_model(model_name, model_date):
    base_url = 'http://download.tensorflow.org/models/object_detection/tf2/'
    model_file = model_name + '.tar.gz'
    model_dir = tf.keras.utils.get_file(fname=model_name,
                                        origin=base_url + model_date + '/' + model_file,
                                        untar=True)
    return str(model_dir)

MODEL_DATE = '20200711'
MODEL_NAME = 'centernet_hg104_1024x1024_coco17_tpu-32'

# PATH_TO_MODEL_DIR = download_model(MODEL_NAME, MODEL_DATE)
PATH_TO_MODEL_DIR = "/home/lnguyen/Apps/Tensorflow_2/workspace/exported_model_1"

PATH_TO_SAVED_MODEL = PATH_TO_MODEL_DIR + "/saved_model"

print('Loading model...', end='')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

Here is the result after run:

Traceback (most recent call last):
  File "load_model.py", line 2, in <module>
    from .models.research.object_detection.utils import label_map_util
ImportError: attempted relative import with no known parent package

It seem like complaining about import tensorflow.compat.v1 as tf under label_map_util do bypass this i have to disable TF2. Do you have any information about why TF2 cant do detection like TF1? but it have to be disable so that the detection work? or TF2 is not mead to be used for detection ?

Please provide the full traceback. You shouldn't have to disable TF2.

TF2 object detection API is different from TF1 due to design changes. Models in TF2 use Keras, which means that a common API can be followed as done for other Keras models.

P.S. I am not in any way affiliated with Google or Tensoflow, so cannot answer any questions regarding the reasons behind TF2 design choices.

sglvladi commented 4 years ago

It need to connect to the video stream via Opencv, then break down each frame for detection

Just FYI, there is an example on how to use your webcam in the tutorial. See here

lukeNguyen0202 commented 4 years ago

You can use above example to load the saved_model.pb. To do so, set PATH_TO_MODEL_DIR equal to the path to the folder that contains the exported files, e.g. C:/Users/sglvladi/Documents/Tensorflow/workspace/training_demo/exported-models/my_model

I have following what you refer on the load model part here is my code:

import time
from .models.research.object_detection.utils import label_map_util
from .models.research.object_detection.utils import visualization_utils as viz_utils
import tensorflow as tf

# Download and extract model
def download_model(model_name, model_date):
    base_url = 'http://download.tensorflow.org/models/object_detection/tf2/'
    model_file = model_name + '.tar.gz'
    model_dir = tf.keras.utils.get_file(fname=model_name,
                                        origin=base_url + model_date + '/' + model_file,
                                        untar=True)
    return str(model_dir)

MODEL_DATE = '20200711'
MODEL_NAME = 'centernet_hg104_1024x1024_coco17_tpu-32'

# PATH_TO_MODEL_DIR = download_model(MODEL_NAME, MODEL_DATE)
PATH_TO_MODEL_DIR = "/home/lnguyen/Apps/Tensorflow_2/workspace/exported_model_1"

PATH_TO_SAVED_MODEL = PATH_TO_MODEL_DIR + "/saved_model"

print('Loading model...', end='')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

Here is the result after run:

Traceback (most recent call last):
  File "load_model.py", line 2, in <module>
    from .models.research.object_detection.utils import label_map_util
ImportError: attempted relative import with no known parent package

It seem like complaining about import tensorflow.compat.v1 as tf under label_map_util do bypass this i have to disable TF2. Do you have any information about why TF2 cant do detection like TF1? but it have to be disable so that the detection work? or TF2 is not mead to be used for detection ?

Please provide the full traceback. You shouldn't have to disable TF2.

TF2 object detection API is different from TF1 due to design changes. Models in TF2 use Keras, which means that a common API can be followed as done for other Keras models.

P.S. I am not in any way affiliated with Google or Tensoflow, so cannot answer any questions regarding the reasons behind TF2 design choices.

Here is the full full traceback models/research/object_detection/utils/label_map_util.py <-- then im not sure why it complain about the import tensorflow.compat.v1 as tf

Im checking out your webcame code right now.

Thank you for all your time, assistant, and the information your share :)

lukeNguyen0202 commented 4 years ago

@sglvladi

It still complain about the label_map_util.py


(Tensorflow_Opencv) lnguyen@pop-luke:~/PycharmProjects/Tensorflow_Opencv$ python3 object_detection_camera.py 
Downloading model. This may take a while... Done
Downloading label file... Done
Traceback (most recent call last):
  File "object_detection_camera.py", line 88, in <module>
    from .models.research.object_detection.utils import label_map_util
ImportError: attempted relative import with no known parent package

full traceback /home/lnguyen/PycharmProjects/Tensorflow_Opencv/models/research/object_detection/utils/label_map_util.py

The Error in the label_map_util image

sglvladi commented 4 years ago

@lukeNguyen0202 hmmm... That's odd.. I'll try to reproduce it locally and will get back to you.

In the meantime, can you confirm the version of Tensorflow you have installed? Also, how did you download tensorflow/models? If you used git, can you specify the latest commit on your local clone?

lukeNguyen0202 commented 4 years ago

@sglvladi

tensorflow==2.3.1 I think i downloaded model using git, downloaded as a zip file. I dont know how to check on the "latest commit" on my local clone, but I download it in Oct 8

sglvladi commented 4 years ago

@lukeNguyen0202 I am afraid I cannot reproduce the issue. I've performed a clean install of Tensorflow as per the instructions of the tutorial, and then installed the latest version of tensorflow/models. The object_detection_camera.py script seems to work just fine.

The traceback you have provided seems to suggest that the code is tripping over at line 88 of object_detection_camera.py. If you have not modified the file, this is the line where import tensorflow as tf should be, which is very bizarre. If you are getting this issue when importing tensorflow, then you should not have been able to train your model.

sglvladi commented 4 years ago
(Tensorflow_Opencv) lnguyen@pop-luke:~/PycharmProjects/Tensorflow_Opencv$ python3 object_detection_camera.py 
Downloading model. This may take a while... Done
Downloading label file... Done
Traceback (most recent call last):
  File "object_detection_camera.py", line 88, in <module>
    from .models.research.object_detection.utils import label_map_util
ImportError: attempted relative import with no known parent package

Just noticed you have replaced from object_detection.utils import label_map_util with from .models.research.object_detection.utils import label_map_util. Why have you done this?

lukeNguyen0202 commented 4 years ago

image @sglvladi I was able to do the training outside of the project. after the training, I created a new project, copied the tensorflow/models and the export_model in I assumed everything should be working

If I didnt do it, the project would not know where to look for the label_map_util.

sglvladi commented 4 years ago

@lukeNguyen0202 If you followed all the installation steps correctly then you should have the object_detection package installed (i.e. on your PYTHONPATH), which is where label_map_util can be found. Thus, you should not need to use relative imports as you have done above.

Please have a look here. Follow these instructions closely, remove the relative imports and retry.

sglvladi commented 4 years ago

One thing I also noticed is that you are using Pycharm. When creating a new project, are you making sure that you are using the same Python environment as the one where Tensorflow and the object detection api are installed? I am asking because Pycharm by default creates new virtual (venv) environments for new projects, meaning that you may have installed stuff in one project/environment, then created a new project with a new (clean) environment.

lukeNguyen0202 commented 4 years ago

@sglvladi

I understand it now. Thank you for the clarification. This is very helpful information.

After install the object_detection package. the label_map_util is working

lukeNguyen0202 commented 4 years ago

@sglvladi

I understand this is not your problem to solve but I got this issue after running the file:

(Tensorflow_Opencv) lnguyen@pop-luke:~/PycharmProjects/Tensorflow_Opencv$ python3 object_detection_camera.py 
Traceback (most recent call last):
  File "object_detection_camera.py", line 191, in <module>
    cv2.imshow('object detection', image_np_with_detections)
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-a98tlsvg/opencv/modules/highgui/src/window.cpp:651: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvShowImage'

But If you know where to look for the answer to solved this. it would be very appreciated for me

thank you again . hope you have a good day :)

sglvladi commented 4 years ago

@lukeNguyen0202, I'm glad you've managed to sort it 👍

To fix the above error run the following:

pip uninstall opencv-python-headless

then depending on whether you are using conda or not

conda install opencv

or

pip install opencv-python

This is because the object detection package installs opencv-python-headless which doesn't include display functions, including imshow()

lukeNguyen0202 commented 4 years ago

@sglvladi

I got it working, and I appreciate your involvement in this project. Thank you Thank you very much!

lukeNguyen0202 commented 4 years ago

Hello @sglvladi ,

I would like to reopen the issue cause I believe there are some missing steps from my training on the the custom detection. I would be very appreciate that you can tell me how to think and guide me to do this correctly.

everything is working really well until i replace the data/model with my custom training model

here is the result image

its fail to detected the object

one thing I noticed that the save_model.pb is only 8MB. does not matter how long I let the training run, after I exported it, it is 8MB all the time

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.