jakeret / tf_unet

Generic U-Net Tensorflow implementation for image segmentation
GNU General Public License v3.0
1.9k stars 748 forks source link

Sync "https://github.com/jakeret/tf_unet/pull/202" with master and resolve conflicts #276

Open ashahba opened 5 years ago

ashahba commented 5 years ago

When using the image gcr.io/deeplearning-platform-release/tf-cpu.1-14 and while following this steps: https://github.com/IntelAI/models/blob/v1.4.0/benchmarks/image_segmentation/tensorflow/unet/README.md I get the following error:

2019-07-09 17:03:43.718942: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2019-07-09 17:03:43.771372: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0709 17:03:43.853285 139741926975296 deprecation_wrapper.py:119] From /workspace/models/tf_unet/unet.py:301: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0709 17:03:43.874177 139741926975296 deprecation.py:323] From /root/miniconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "/workspace/benchmarks/image_segmentation/tensorflow/unet/inference/fp32/unet_infer.py", line 78, in <module>
    prediction = net.predict(arg_parser.parse_args().ckpt_path, x_test)
  File "/workspace/models/tf_unet/unet.py", line 274, in predict
    self.restore(sess, model_path)
  File "/workspace/models/tf_unet/unet.py", line 302, in restore
    saver.restore(sess, model_path)
  File "/root/miniconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1278, in restore
    compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /checkpoints/model.cpkt
Ran inference with batch size 1
Log location outside container: /jenkins/workspace/Intel-Models-Benchmark-fp32-Trigger/intel-models/benchmarks/common/tensorflow/logs/benchmark_unet_inference_fp32_20190709_170331.log
nrvalgo_jenkinsadm@aipg-fm-skx-48:/jenkins/workspace/Intel-Models-Benchmark-fp32-Trigger/intel-models/benchmarks$ ls $CHECKPOINT_DIR/
checkpoint  events.out.tfevents.1548972182.4e4b03cdde24  model.ckpt.data-00000-of-00001  model.ckpt.index  model.ckpt.meta
ashahba commented 5 years ago

@jakeret this is basically just bringing #202 up to date with master. I also realized the issue with https://github.com/IntelAI/models/blob/v1.4.0/benchmarks/image_segmentation/tensorflow/unet/README.md was that I was using checkpoint_name=model.cpkt not realizing that it's now checkpoint_name=model.ckpt and I fixed our docs.

Thanks.

ashahba commented 5 years ago

@mpjlu would you also please review and provide feedback if needed.

Thanks.

jakeret commented 5 years ago

hi @ashahba , thank you for your contribution. I wasn't aware that this repo is being used in IntelAI benchmarks, nice.

I hadn't merged #202 because of two reasons

ashahba commented 5 years ago

Thanks @jakeret That sounds great. In the meantime I'm unblocked right now but I keep my eyes open for the any activity on #202