tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.76k forks source link

[deeplab] model_test.py failed after installing deeplab with tensorflow=1.6 #4119

Closed wenouyang closed 6 years ago

wenouyang commented 6 years ago

I am using tensorflow version of 1.6, and run the model_test.py, and it got the following error message

(/data/virtualE/deeplab) [ug@h40 research]$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
1.6.0                                                                                                                         
(/data/virtualE/deeplab) [ug@h40 research]$ python deeplab/model_test.py
2018-04-28 18:38:31.549989: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX                                                                                                              
/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py:560: DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead                                          
  return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)                                                                         
/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/util/tf_inspect.py:45: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()                                                                           
  if d.decorator_argspec is not None), _inspect.getargspec(target))                                                                               
.2018-04-28 18:38:50.142449: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at mkl_concat_op.cc:780 : Aborted: Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777          
E...                                                                                                                                              
======================================================================                                                                            
ERROR: testForwardpassDeepLabv3plus (__main__.DeeplabModelTest)                                                                                   
----------------------------------------------------------------------                                                                            
Traceback (most recent call last):                                                                                                                
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call         
    return fn(*args)                                                                                                                              
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn          
    target_list, status, run_metadata)                                                                                                            
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__   
    c_api.TF_GetCode(self.status.status))                                                                                                         
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777                                                                                      
         [[Node: concat = _MklConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](ResizeBilinear, aspp0/Relu, concat/axis, DMT/_283, aspp0/Relu:1, DMT/_284)]]                                                                        

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "deeplab/model_test.py", line 108, in testForwardpassDeepLabv3plus
    outputs_to_scales_to_logits = sess.run(outputs_to_scales_to_logits)  
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)                                                                                                              
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)                                                                                         
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
    options, run_metadata)                                                                                                              
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)                                                                                                 
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777                                                                                      
         [[Node: concat = _MklConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](ResizeBilinear, aspp0/Relu, concat/axis, DMT/_283, aspp0/Relu:1, DMT/_284)]]                                                                        

Caused by op 'concat', defined at:
  File "deeplab/model_test.py", line 120, in <module>
    tf.test.main()                                   
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/platform/test.py", line 76, in main
    return _googletest.main(argv)                                                                                                 
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/platform/googletest.py", line 99, in main
    benchmark.benchmarks_main(true_main=main_wrapper)                                                                                   
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/platform/benchmark.py", line 338, in benchmarks_main                                                                                                                                                 
    true_main()                                                                                                                                   
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/platform/googletest.py", line 98, in main_wrapper  
    return app.run(main=g_main, argv=args)                                                                                                        
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/platform/googletest.py", line 69, in g_main
    return unittest_main(argv=argv)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/data/virtualE/deeplab/lib/python3.6/unittest/main.py", line 256, in runTests
    self.result = testRunner.run(self.test)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/runner.py", line 176, in run
    test(result)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/suite.py", line 122, in run
    test(result)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/suite.py", line 122, in run
    test(result)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/case.py", line 653, in __call__
    return self.run(*args, **kwds)
  File "/data/virtualE/deeplab/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "deeplab/model_test.py", line 105, in testForwardpassDeepLabv3plus
    image_pyramid=[1.0])
  File "/data/virtualE/deeplab/models/research/deeplab/model.py", line 296, in multi_scale_logits
    fine_tune_batch_norm=fine_tune_batch_norm)
  File "/data/virtualE/deeplab/models/research/deeplab/model.py", line 461, in _get_logits
    fine_tune_batch_norm=fine_tune_batch_norm)
  File "/data/virtualE/deeplab/models/research/deeplab/model.py", line 424, in _extract_features
    concat_logits = tf.concat(branch_logits, 3)
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1175, in concat
    return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 625, in _concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/data/virtualE/deeplab/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

AbortedError (see above for traceback): Operation received an exception:Status: 3, message: could not create a concat primitive descriptor, in file tensorflow/core/kernels/mkl_concat_op.cc:777
         [[Node: concat = _MklConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](ResizeBilinear, aspp0/Relu, concat/axis, DMT/_283, aspp0/Relu:1, DMT/_284)]]

----------------------------------------------------------------------
Ran 5 tests in 18.651s

FAILED (errors=1)
tensorflowbutler commented 6 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code OS Platform and Distribution TensorFlow installed from Bazel version CUDA/cuDNN version GPU model and memory Exact command to reproduce

wenouyang commented 6 years ago

It seems that I have to roll back to tensorflow=1.5

yhliang2018 commented 6 years ago

@wenouyang Do you solve the issue after roll-back? Or Maybe you can try the latest version?

wenouyang commented 6 years ago

@yhliang2018, it works with tensorflow=1.5.

yhliang2018 commented 6 years ago

Thanks! Will close it for now.