intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
https://intel.github.io/neural-compressor/
Apache License 2.0
2.2k stars 254 forks source link

InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2) #17

Closed peiwenhuang27 closed 3 years ago

peiwenhuang27 commented 3 years ago

Case 1

Framework: Tensorflow 2.5.0, Intel-Tensorflow 2.5.0 Environment: Google Colab

I have a successfully quantized model that is to be run for inference without using LPOT API, so I wrote the following inference code:

with tf.compat.v1.Session() as sess:
    tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
    output = sess.graph.get_tensor_by_name(output_tensor_name)
    predictions = sess.run(output, {input_tensor_name: x})
    mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y, predictions))
    print(mse.eval())

When running the line predictions = sess.run(output, {input_tensor_name: x}):

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1374     try:
-> 1375       return fn(*args)
   1376     except errors.OpError as e:

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1360                                       target_list, run_metadata)
   1361 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1452                                             fetch_list, target_list,
-> 1453                                             run_metadata)
   1454 

InternalError: Missing 0-th output from {{node model/layer_1/Conv2D_eightbit_requantize}}

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)
<ipython-input-6-2bddd853d111> in <module>()
      2     tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
      3     output = sess.graph.get_tensor_by_name(output_tensor_name)
----> 4     predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
      5     mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
      6     print(mse.eval())

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    966     try:
    967       result = self._run(None, fetches, feed_dict, options_ptr,
--> 968                          run_metadata_ptr)
    969       if run_metadata:
    970         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1190       results = self._do_run(handle, final_targets, final_fetches,
-> 1191                              feed_dict_tensor, options, run_metadata)
   1192     else:
   1193       results = []

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1367     if handle is None:
   1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369                            run_metadata)
   1370     else:
   1371       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1392                     '\nsession_config.graph_options.rewrite_options.'
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)
   1395 
   1396   def _extend_graph(self):

InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2) 

This error happens with or without Intel-Tensorflow==2.5.0 installed, nor is it resolved when os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' is set explicitly.

On the other hand, when I run the same code in VS Code with Python 3.6.8 64-bit base: Conda, it returns the same error message as in Case 2.

Case 2

Framework: Tensorflow 2.4.0, Intel-Tensorflow 2.4.0 Environment: Google Colab

This case works well and prints out the MSE loss of the predictions, but when I uninstall Intel-Tensorflow 2.4.0 and run it with official Tensorflow, while running the same line in Case 1 (predictions = sess.run(output, {input_tensor_name: x})):

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1374     try:
-> 1375       return fn(*args)
   1376     except errors.OpError as e:

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1357       # Ensure any changes to the graph are reflected in the runtime.
-> 1358       self._extend_graph()
   1359       return self._call_tf_sessionrun(options, feed_dict, fetch_list,

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
   1397     with self._graph._session_run_lock():  # pylint: disable=protected-access
-> 1398       tf_session.ExtendSession(self._session)
   1399 

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by {{node model/dense/Tensordot/MatMul_eightbit_requantize}} with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

     [[model/dense/Tensordot/MatMul_eightbit_requantize]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-6-2bddd853d111> in <module>()
      2     tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
      3     output = sess.graph.get_tensor_by_name(output_tensor_name)
----> 4     predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
      5     mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
      6     print(mse.eval())

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    966     try:
    967       result = self._run(None, fetches, feed_dict, options_ptr,
--> 968                          run_metadata_ptr)
    969       if run_metadata:
    970         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1189     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1190       results = self._do_run(handle, final_targets, final_fetches,
-> 1191                              feed_dict_tensor, options, run_metadata)
   1192     else:
   1193       results = []

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1367     if handle is None:
   1368       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369                            run_metadata)
   1370     else:
   1371       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1392                     '\nsession_config.graph_options.rewrite_options.'
   1393                     'disable_meta_optimizer = True')
-> 1394       raise type(e)(node_def, op, message)
   1395 
   1396   def _extend_graph(self):

InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by node model/dense/Tensordot/MatMul_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2)  with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

     [[model/dense/Tensordot/MatMul_eightbit_requantize]]

The error persists even with os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' set explicitly.

I believe both cases are caused by the same type of error, i.e. No OpKernel was registered to support Op ...

I was given to understand that with official Tensorflow v2.5 installed and the environment variable TF_ENABLE_ONEDNN_OPTS=1 set (reference), the quantized model is supposed to run with oneDNN supported. But it doesn't seem to be the case in neither v2.4 nor v2.5.

Not sure if this is the right place to post this issue, but I have nowhere else to report the problem as Intel-Tensorflow doesn't allow issue reporting and Tensorflow developers usually ignore issues dependent on other packages. Any hint is greatly appreciated, thank you.

ftian1 commented 3 years ago

@peiwenhuang27 any issues with "InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize " message are because core/common_runtime/mkl_layout_pass.cc doesn't rewrite graph correctly.

TF_ENABLE_MKL_NATIVE_FORMAT must be set always.

The most likely case in your side should be those variables don't take effect in c++ code.

peiwenhuang27 commented 3 years ago

Since I'm not sure if the issue is because of Colab, I tried running it in my local machine. I set the variables using

import os
os['TF_ENABLE_ONEDNN_OPTS'] = '1'
os['TF_ENABLE_MKL_NATIVE_FORMAT'] = '1' # also tried '0', not sure what its value should be

But the following error still occurs:


2021-07-19 18:36:45.607306: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:From main.py:24: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
Traceback (most recent call last):
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1358, in _run_fn
    self._extend_graph()
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1398, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by {{node model/dense/Tensordot/MatMul_eightbit_requantize}} with these attrs: [input_quant_mode="MIN_FIRST", Toutput=DT_FLOAT, T1=DT_QUINT8, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

     [[model/dense/Tensordot/MatMul_eightbit_requantize]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 26, in <module>
    predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 968, in run
    run_metadata_ptr)
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1191, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1369, in _do_run
    run_metadata)
  File "/Users/joannehuang/Documents/Work/quantization/TF2_inference/Intel-Tensorflow-Env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by node model/dense/Tensordot/MatMul_eightbit_requantize (defined at main.py:24)  with these attrs: [input_quant_mode="MIN_FIRST", Toutput=DT_FLOAT, T1=DT_QUINT8, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
  <no registered kernels>

     [[model/dense/Tensordot/MatMul_eightbit_requantize]]

From the command line log (This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA), looks like the environment variables have been set successfully.

ftian1 commented 3 years ago

if possible, could you pls share us your pb file and evaluation script?

peiwenhuang27 commented 3 years ago

Due to some reasons, I cannot directly upload my files here. I have emailed the files to you, please check your inbox for them. Thank you so much, I truly appreciate it!

peiwenhuang27 commented 3 years ago

By the way, I found the following in the release notes for Intel-Tensorflow 2.5:

Only native layout format is supported (The environment variable TF_ENABLE_MKL_NATIVE_FORMAT will not have any effect) The oneDNN optimizations in official TensorFlow will not include int8 quantization (it will still be available in Intel Optimized TensorFlow). It will be available in later versions of official TensorFlow.

It looks like TF_ENABLE_MKL_NATIVE_FORMAT does not affect the environment, and that all the int8 quantized operations are not supported yet in official Tensorflow. Could it be the reason for the error?

guomingz commented 3 years ago

By the way, I found the following in the release notes for Intel-Tensorflow 2.5:

Only native layout format is supported (The environment variable TF_ENABLE_MKL_NATIVE_FORMAT will not have any effect) The oneDNN optimizations in official TensorFlow will not include int8 quantization (it will still be available in Intel Optimized TensorFlow). It will be available in later versions of official TensorFlow.

It looks like TF_ENABLE_MKL_NATIVE_FORMAT does not affect the environment, and that all the int8 quantized operations are not supported yet in official Tensorflow. Could it be the reason for the error?

Need to set the TF_ENABLE_MKL_NATIVE_FORMAT=0 for int8 model execution with intel-tensorflow 2.5.0

peiwenhuang27 commented 3 years ago

I see! Thanks, it works now with Intel-Tensorflow, I encountered the problem mainly because I wanted the model to be able to run in official Tensorflow without Intel-Tensorflow (for simplicity in further inference session that will be run in ML.Net)

ftian1 commented 3 years ago

From offical TensorFlow 2.6, intel optimizations have been upstreamed into offical tensorflow.

In the future, it will become default path on CPU version.

johnsGuo commented 2 years ago

I have the same error with tensorflow2.7, tfserving2.7.0-gpu

 Missing 0-th output from {{node model/din_attention_layer/StatefulPartitionedCall_1/StatefulPartitionedCall/dense/Tensordot/MatMul_eightbit_requantize
johnsGuo commented 2 years ago

I have Fix it

blime4 commented 2 years ago

I have Fix it How did you solve it? I met the same problem 请问你是如何解决的呢,我遇到了相同的问题。

blime4 commented 2 years ago

I have Fix it

Code is : https://github.com/wenxcsstore/tvm.dx/blob/bcbf4f9b1ddc4326364a3aa2fc82aaf4df8d53e8/tests/python/frontend/tensorflow/test_forward.py#L5217-L5223

log is :

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Missing 0-th output from {{node x}} [[SpopFnInvocation]] (1) Internal: Missing 0-th output from {{node x}} [[SpopFnInvocation]] [[SpopFnInvocation/_1]] 0 successful operations. 0 derived errors ignored.

Akshaysharma29 commented 2 years ago
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1'
os.environ['TF_ENABLE_MKL_NATIVE_FORMAT'] = '1' 

This fix my issue. Thanks