Running SSD_ResNet_101_FPN with custom dataset fails on GCP ML Engine

Sri-vatsa commented 5 years ago

System information

What is the top-level directory of the model you are using: object_detection.model_main
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04, Python 3.5
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 1.13.1
Bazel version (if compiling from source): -
CUDA/cuDNN version: -
GPU model and memory: -
Exact command to reproduce: -

Describe the problem

Tried training the SSD_ResNet_101_FPN (i.e. RetinaNet) using GCP ML Engine with 1 Master, 5 workers (each with 4 K80 GPUs) & 3 parameter servers. Used the standard pipeline config file from object_detection.samples.config and downloaded pretrained model from tensorflow model zoo. I zipped pycocotools, slim and object_detection by cloning the tensorflow/models repository as of 27 Mar 5.30 am (EST).

The job runs fine for training but when object_detection_evaluation.py is run, there is an error: NameError: name 'unicode' is not defined

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

The replica master 0 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 511, in _actual_eval return _evaluate() File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 493, in _evaluate self._evaluate_build_graph(input_fn, hooks, checkpoint_path)) File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1424, in _evaluate_build_graph self._call_model_fn_eval(input_fn, self.config)) File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1460, in _call_model_fn_eval features, labels, model_fn_lib.ModeKeys.EVAL, config) File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/root/.local/lib/python3.5/site-packages/object_detection/model_lib.py", line 454, in model_fn eval_config, list(category_index.values()), eval_dict) File "/root/.local/lib/python3.5/site-packages/object_detection/eval_util.py", line 913, in get_eval_metric_ops_for_evaluators evaluators_list = get_evaluators(eval_config, categories, evaluator_options) File "/root/.local/lib/python3.5/site-packages/object_detection/eval_util.py", line 890, in get_evaluators **kwargs_dict)) File "/root/.local/lib/python3.5/site-packages/object_detection/utils/object_detection_evaluation.py", line 569, in __init__ group_of_weight=group_of_weight) File "/root/.local/lib/python3.5/site-packages/object_detection/utils/object_detection_evaluation.py", line 194, in __init__ self._build_metric_names() File "/root/.local/lib/python3.5/site-packages/object_detection/utils/object_detection_evaluation.py", line 213, in _build_metric_names category_name = unicode(category_name, 'utf-8') NameError: name 'unicode' is not defined

Sri-vatsa commented 5 years ago

I have found a temporary fix. On line 213 in object_detection_evaluation.py from tensorflow/models/object_detection/utils/, I replaced category_name = unicode(category_name, 'utf-8') withcategory_name = str(category_name, 'utf-8'). This seems to work. If this is the right fix for this error, I will be glad to make a PR for this issue.

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

tensorflow / models