tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
185.86k stars 74.23k forks source link

Tensorflow 2.3.0 MKL Intel AVX Binary Issue #45853

Closed gmatalongthewatchtower closed 3 years ago

gmatalongthewatchtower commented 3 years ago

System information

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

2.3.0

Describe the current behavior Here's the code:


import tensorflow as tf
import numpy as np
a = np.array([2 , 4, 5])
ap=tf.constant(a)

Here's the warning message:

" I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags."

Moreover, if I run system check mentioned on Intel's website (https://software.intel.com/content/www/us/en/develop/articles/intel-optimization-for-tensorflow-installation-guide.html),

Code:

import tensorflow as tf
major_version = int(tf.__version__.split(".")[0])
if major_version >= 2:
   from tensorflow.python import _pywrap_util_port
   print("MKL enabled:", _pywrap_util_port.IsMklEnabled())
else:
   print("MKL enabled:", tf.pywrap_tensorflow.IsMklEnabled()) 

I get

MKL enabled: False

Describe the expected behavior

I shouldn't get the warning about AVX because I am using Intel's MKL version. Moreover, surprisingly, if I downgrade tensorflow to 2.1, warning changes to the issue described https://github.com/tensorflow/tensorflow/issues/45632.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np
a = np.array([2 , 4, 5])
ap=tf.constant(a)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

CPU Version: Intel i7-10610U CPU @1.8GHz I have installed Tensorflow in a new environment to ensure that there is no issue with dependencies.

gmatalongthewatchtower commented 3 years ago

Please add label "comp:mkl", as requested at https://software.intel.com/content/www/us/en/develop/articles/intel-optimization-for-tensorflow-installation-guide.html

wei-v-wang commented 3 years ago

"I shouldn't get the warning about AVX because I am using Intel's MKL version.". The warning is not complaining that AVX is not enabled. It is just an informative message saying only for some ops, Intel's oneDNN would be used and AVX or AVX2 would be turned on.

gmatalongthewatchtower commented 3 years ago

"I shouldn't get the warning about AVX because I am using Intel's MKL version.". The warning is not complaining that AVX is not enabled. It is just an informative message saying only for some ops, Intel's oneDNN would be used and AVX or AVX2 would be turned on.

Thanks for your help. My hypothesis is that if we use Intel's MKL version, we shouldn't get the warning because Tensorflow would be optimized for our Intel processor. So, I believe we shouldn't get the informative message (This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX AVX2). More importantly, given what you have said, we don't know whether we are getting full performance out of our CPU. Also, if we check whether mkl_enabled flag (as per documentation on Intel's website), it shows that MKL is not. I also tested with Tensorflow 2.1: the warning changes, but the flag is enabled. (https://github.com/tensorflow/tensorflow/issues/45632) I think there is an issue with Intel's Tensorflow-mkl package. I am a beginner. So, much of what I have written could not be right.

NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower

I run following code, get the result: MKL enabled: True

import tensorflow as tf
major_version = int(tf.__version__.split(".")[0])
if major_version >= 2:
   from tensorflow.python import _pywrap_util_port
   print("MKL enabled:", _pywrap_util_port.IsMklEnabled())
else:
   print("MKL enabled:", tf.pywrap_tensorflow.IsMklEnabled()) 

Could you check your Tensorflow version?

I install it by: conda install tensorflow=2.3.0=mkl_py37h0481017_0

If you want to confirm the MKL(oneDNN) is enabled, please set the ENV: export MKLDNN_VERBOSE=1

You will see following log if Tensorflow-MKL is installed/built successfully.

2020-12-22 12:24:35.981441: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-22 12:24:36.014367: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2100000000 Hz
2020-12-22 12:24:36.020244: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cdf9586130 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-22 12:24:36.020286: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-12-22 12:24:36.020435: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

Input image shape: (100, 224, 224, 3)
dnnl_verbose,info,oneDNN v1.4.0 (commit N/A)
dnnl_verbose,info,cpu,runtime:OpenMP
dnnl_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
dnnl_verbose,info,gpu,runtime:none
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:acdb:f0 dst_f32::blocked:abcd:f0,,,32x3x224x224,49.5181
dnnl_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:cdba:f0 dst_f32::blocked:Acdb16a:f0,,,64x3x7x7,0.413818
dnnl_verbose,exec,cpu,convolution,jit:avx512_common,forward_training,src_f32::blocked:abcd:f0 wei_f32::blocked:Acdb16a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:aBcd16b:f0,,alg:convolution_direct,mb32_ic3oc64_ih224oh112kh7sh2dh0ph3_iw224ow112kw7sw2dw0pw3,11.1809
dnnl_verbose,exec,cpu,batch_normalization,bnorm_jit:avx512_common,forward_inference,data_f32::blocked:aBcd16b:f0 diff_undef::un
gmatalongthewatchtower commented 3 years ago

@NeoZhangJianyu Here's the output of tf version:

tf.__version__
'2.3.0'

Here's the output from MKL code:

import tensorflow as tf
major_version = int(tf.__version__.split(".")[0])
if major_version >= 2:
    from tensorflow.python import _pywrap_util_port
    print("MKL enabled:", _pywrap_util_port.IsMklEnabled())
else:
    print("MKL enabled:", tf.pywrap_tensorflow.IsMklEnabled()) 

MKL enabled: False

Here's the package name after running conda list command:

tensorflow-mkl 2.3.0 h93d2e19_0

You have told me to set export MKLDNN_VERBOSE=1. This looks like Linux environment variable. How can I set this on Windows 10?

Thanks so much for helping me with this issue.

Can you please let me know if you have more questions?

NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower In windows, set MKLDNN_VERBOSE=1

gmatalongthewatchtower commented 3 years ago

@NeoZhangJianyu : Thanks again for your help. I did set MKLDNN_VERBOSE=1. How do I verify whether this flag is set? I ran the following code:

import tensorflow as tf
major_version = int(tf.__version__.split(".")[0])
if major_version >= 2:
    from tensorflow.python import _pywrap_util_port
    print("MKL enabled:", _pywrap_util_port.IsMklEnabled())
else:
    print("MKL enabled:", tf.pywrap_tensorflow.IsMklEnabled()) 

I am still getting MKL enabled: False

I ran the following sample code to test verbosity of mkl:

import tensorflow as tf
from tensorflow import keras

input_A = keras.layers.Input(shape=[5], name="wide_input")
input_B = keras.layers.Input(shape=[6], name="deep_input")
hidden1 = keras.layers.Dense(30, activation="relu")(input_B)
hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
concat = keras.layers.concatenate([input_A, hidden2])
output = keras.layers.Dense(1, name="output")(concat)
model = keras.models.Model(inputs=[input_A, input_B], outputs=[output])

Here's the output:

[2020-12-23 12:18:25.936282: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.](url)

NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower As explain of @wei-v-wang , the warning is not complaining that AVX is not enabled. Could paste more log?

gmatalongthewatchtower commented 3 years ago

Hello @NeoZhangJianyu, I did respond to @wei-v-wang above. It is true that warning is not complaining, but there are two red flags. 1) We don't know whether we are getting full performance out of our CPU. 2) if you see mkl_enabled flag (as per documentation on Intel's website), it shows that MKL is not enabled.

I am happy to generate log files. Can you please let me know the code and steps to generate these? I am using tf 2.3 on Windows 10.

NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower 1) check MKL is working in your TF.

Windows: set MKLDNN_VERBOSE=1

Linux: export MKLDNN_VERBOSE=1

Execute following python script based on TF.

import numpy as np
import tensorflow as tf

x_in = np.array([[
  [[2], [1], [2], [0], [1]],
  [[1], [3], [2], [2], [3]],
  [[1], [1], [3], [3], [0]],
  [[2], [2], [0], [1], [1]],
  [[0], [0], [3], [1], [2]], ]])
kernel_in = np.array([
 [ [[2, 0.1]], [[3, 0.2]] ],
 [ [[0, 0.3]],[[1, 0.4]] ], ])
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.conv2d(x, kernel, strides=[1, 1, 1, 1], padding='VALID')

you will see the log which show oneDNN is enabled:

2020-12-25 09:20:59.557263: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-25 09:20:59.582017: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
dnnl_verbose,info,oneDNN v1.4.0 (commit N/A)
dnnl_verbose,info,cpu,runtime:OpenMP
dnnl_verbose,info,cpu,isa:Intel AVX2
dnnl_verbose,info,gpu,runtime:none
dnnl_verbose,exec,cpu,reorder,simple:any,undef,src_f32::blocked:cdba:f0 dst_f32:p:blocked:Acdb8a:f0,,,2x1x2x2,0.0439
dnnl_verbose,exec,cpu,convolution,jit:avx2,forward_training,src_f32::blocked:abcd:f0 wei_f32:p:blocked:Acdb8a:f0 bia_undef::undef::f0 dst_f32:p:blocked:aBcd8b:f0,,alg:convolution_direct,mb1_ic1oc2_ih5oh4kh2sh1dh0ph0_iw5ow4kw2sw1dw0pw0,0.0589
dnnl_verbose,exec,cpu,reorder,simple:any,undef,src_f32:p:blocked:aBcd8b:f0 dst_f32::blocked:acdb:f0,,,1x2x4x4,0.0295
  1. MKL enabled issue I reproduce your issue by following code.
    import tensorflow as tf
    major_version = int(tf.__version__.split(".")[0])
    if major_version >= 2:
    from tensorflow.python import _pywrap_util_port
    print("MKL enabled:", _pywrap_util_port.IsMklEnabled())
    else:
    print("MKL enabled:", tf.pywrap_tensorflow.IsMklEnabled()) 

The root cause is tensorflow is MKL version and tensorflow-base is GPU version. Run conda list, I see:

tensorflow-2.3.0-mkl_py37h936c3e2_0
tensorflow-base-2.3.0-gpu_py37h18d21e4_0

I fix it by remove them and reinstall by:

conda install tensorflow tensorflow-base=2.3.0=mkl_py37h7075554_0
conda list
...
tensorflow                2.3.0           mkl_py37he70e3f7_0    defaults
tensorflow-base           2.3.0           mkl_py37h7075554_0    defaults
...
gmatalongthewatchtower commented 3 years ago

@NeoZhangJianyu: Thanks so much for your guidance.

1. Generate logs

I did try to run set mkldnn verbose=1 in Windows Command Prompt, but I don't see verbose logs. I believe this set command is not being detected by python. Could you please guide me where should I run this command? I have tried running it by doing all of these below: 1) on Anaconda Prompt that comes with Anaconda using Administrative privileges. 2) I use PyCharm. I tried running it from the terminal using Admin privileges. 3) Anaconda NAvigator -> Environment -> Right click terminal -> run command using Admin privileges. 4) I googled this and found that we can write a code as well. I ran the following code and then the code that you gave me:

import mkl
mkl.verbose(1)

I ran the code you gave me, and here's the output:

MKL_VERBOSE Intel(R) MKL 2020.0 Update 2 Product build 20200624 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 1.80GHz lp64 intel_thread
MKL_VERBOSE SDOT(2,0000020AB5779ED0,1,0000020AB5779ED0,1) 60.51us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
2020-12-26 02:14:32.134646: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

<tf.Tensor: shape=(1, 4, 4, 2), dtype=float32, numpy= 
...

Could you please let me know your thoughts?

2. Version-check I did run conda list.

Here are the tensorflow packages I have:

tensorboard               2.3.0              pyh4dce500_0
tensorboard-plugin-wit    1.6.0                      py_0
tensorflow                2.3.0           mkl_py37h3bad0a6_0
tensorflow-base           2.3.0           eigen_py37h17acbac_0
tensorflow-datasets       1.2.0                    py37_0
tensorflow-estimator      2.3.0              pyheb71bc4_0
tensorflow-metadata       0.14.0             pyhe6710b0_1
tensorflow-mkl            2.3.0                h93d2e19_0
termcolor                 1.1.0                    py37_1

Please note that I haven't done anything special to install tensorflow except following the instructions on Intel's website posted above. I am not sure whether eigen-tensorflow-base package is same as gpu-version.

Could you please guide me? Thanks for your help. I look forward to hearing from you.

NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower I think your tensorflow release is with problem: tensoflow is for mkl, but tensorflow-base is for eigen.

I guess it be the bug in conda to install the old tensorflow release. In newer release installation, I see it disappear.

Yes. I have provided the guide in above comments.

  1. Please remove the tensorflow by conda command

  2. Reintall it by: conda install tensorflow tensorflow-base=2.3.0=mkl_py37h7075554_0

  3. check the tensorflow release.

    
    conda list
    ...
    tensorflow                2.3.0           mkl_py37he70e3f7_0    defaults
    tensorflow-base           2.3.0           mkl_py37h7075554_0    defaults
    ...


4. test the MKL enable by my comment above.
gmatalongthewatchtower commented 3 years ago

Thanks @NeoZhangJianyu.

I installed tensorflow as per your suggestion:

Here's conda list Output:

tensorflow                2.3.0           mkl_py37he70e3f7_0
tensorflow-base           2.3.0           mkl_py37h7075554_0
tensorflow-estimator      2.3.0              pyheb71bc4_0

Test for mkl

Code:

import tensorflow as tf
major_version = int(tf.__version__.split(".")[0])
if major_version >= 2:
    from tensorflow.python import _pywrap_util_port
    print("MKL enabled:", _pywrap_util_port.IsMklEnabled())
else:
    print("MKL enabled:", tf.pywrap_tensorflow.IsMklEnabled()) 

Output: MKL enabled: True

MKL vs. non-MKL test: Now I test whether MKL is doing better than standard tensorflow 2.3 (conda install tensorflow).

Code:

from tensorflow.keras.layers import Input, Dense, LSTM, Bidirectional, Conv1D
from tensorflow.keras.layers import Flatten, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
import numpy as np
from time import time

def timeit(func, iterations, *args):
    t0 = time()
    for _ in range(iterations):
        func(*args)
    print("Time/iter: %.4f sec" % ((time() - t0) / iterations))

def make_small_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Conv1D(128, 400, strides=4, padding='same')(ipt)
    x     = Flatten()(x)
    x     = Dropout(0.5)(x)
    x     = Dense(64, activation='relu')(x)
    out   = Dense(1,  activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_medium_model(batch_shape):
    ipt   = Input(batch_shape=batch_shape)
    x     = Bidirectional(LSTM(512, activation='relu', return_sequences=True))(ipt)
    x     = LSTM(512, activation='relu', return_sequences=True)(x)
    x     = Conv1D(128, 400, strides=4, padding='same')(x)
    x     = Flatten()(x)
    x     = Dense(256, activation='relu')(x)
    x     = Dropout(0.5)(x)
    x     = Dense(128, activation='relu')(x)
    x     = Dense(64,  activation='relu')(x)
    out   = Dense(1,   activation='sigmoid')(x)
    model = Model(ipt, out)
    model.compile(Adam(lr=1e-4), 'binary_crossentropy')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), np.random.randint(0, 2, (batch_shape[0], 1))

batch_shape = (32, 400, 16)
X, y = make_data(batch_shape)

model_small = make_small_model(batch_shape)
model_small.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_small.train_on_batch, 200, X, y)

K.clear_session()  # in my testing, kernel was restarted instead

model_medium = make_medium_model(batch_shape)
model_medium.train_on_batch(X, y)  # skip first iteration which builds graph
timeit(model_medium.train_on_batch, 10, X, y)
#endregion

Without MKL

Warning message:

2020-12-31 19:06:04.281835: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Output:

Out[2]: 0.6387945413589478
Time/iter: 0.1114 sec
Out[2]: 0.6915708780288696
Time/iter: 20.7256 sec

With MKL

Warning message:

2020-12-31 18:32:50.096360: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-31 18:32:50.097360: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

Output:

Out[2]: 0.6798648834228516
Time/iter: 0.1222 sec
Out[2]: 327.92877197265625
<Timed out...I had to stop the kernel after ~50 minutes>

Given above data, I have three questions, if you don't mind:

1) It seems base tensorflow is much faster than MKL version for the above code. Why so? At the time of installing tensorflow, I got the following question from Conda:

The following packages will be DOWNGRADED:

  intel-openmp                                   2020.2-254 --> 2019.4-245

It seems TF is using older library. I am not too sure. Could this explain the slowness?

2) While I was able to install MKL using the version you provided. Tensorflow 2.4 is already out. In the future, how do I know what version to install?

3) I also noticed that MKL version doesn't use all 8 cores. Non-MKL version (please see package details below) uses all 8 cores.

tensorflow                2.3.0           mkl_py37h04bc1aa_0
tensorflow-base           2.3.0           eigen_py37h17acbac_0
tensorflow-estimator      2.3.0              pyheb71bc4_0
NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower

  1. The TF with MKL needs to be set optimization setting. I try following setting in linux, got shorter time than stack TF in small_model (CNN). But a little longer time than stack TF in medium model (LSTM).

Different model need different setting to reach best performance on CPU. You could try to adjust it based on your CPU.

set TF_ENABLE_MKL_NATIVE_FORMAT=1  
set TF_NUM_INTEROP_THREADS=1
set TF_NUM_INTRAOP_THREADS=4
set  OMP_NUM_THREADS=4
set KMP_BLOCKTIME=1
set KMP_AFFINITY=granularity=fine,compact,1,0

Additional, we recommend to set the physical CPU cores number in above setting, instead of threads number. for example, your CPU is i7-10610U, 4 cores/8 threads. You should use 4 in above setting.

The conda will upgrade/downgrade the depended package during installing tensorflow. So, intel-openmp is downgraded by conda.

  1. In conda, you could use following cmd to search which Tensorflow is supported by conda:
    conda search tensorflow

Currently, the latest release is 2.3

But you could build the TF2.4 from source code by bazel.

  1. Yes, you find the cause. We should try to let the CPU cores are busy to get better performance.
gmatalongthewatchtower commented 3 years ago

Thanks @NeoZhangJianyu.

1: With above settings, Intel's MKL is much slower.

2: I searched for tensorflow, and here's what I got. How do I know which flavor of tensorflow to pick? I can decide between Python 3.7 and 3.8, but I am not sure about mkl versions.

tensorflow                     2.3.0 mkl_py37h04bc1aa_0  pkgs/main
tensorflow                     2.3.0 mkl_py37h10aaca4_0  pkgs/main
tensorflow                     2.3.0 mkl_py37h3bad0a6_0  pkgs/main
tensorflow                     2.3.0 mkl_py37h48e11e3_0  pkgs/main
tensorflow                     2.3.0 mkl_py37h856240d_0  pkgs/main
tensorflow                     2.3.0 mkl_py37h936c3e2_0  pkgs/main
tensorflow                     2.3.0 mkl_py37h952ae9f_0  pkgs/main
tensorflow                     2.3.0 mkl_py37he40ee82_0  pkgs/main
tensorflow                     2.3.0 mkl_py37he70e3f7_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h1fcfbd6_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h37f7ee5_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h3c6dea5_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h46e32b0_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h637f690_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h8557ec7_0  pkgs/main
tensorflow                     2.3.0 mkl_py38h8c0d9a2_0  pkgs/main
tensorflow                     2.3.0 mkl_py38ha39cb68_0  pkgs/main
tensorflow                     2.3.0 mkl_py38hd19cc29_0  pkgs/main

3: Also, is there anyway to use Intel GPU?

NeoZhangJianyu commented 3 years ago

@gmatalongthewatchtower 1. The optimization setting is depended on your system (HW, OS, SW). I try the setting in Intel Xeon CPU, Linux. It maybe not adapt to your system. Please tuning the values to get better performance. You could check if the CPU is full when running.

  1. The newer is better.

  2. Tensorflow support Intel GPU is coming. By now, you could use OpenVINO for inference, accelerated by Intel Integrated GPU.

Tlevi16 commented 3 years ago

same problem for me, but I tried all above methods but did not work :(

NeoZhangJianyu commented 3 years ago

@Tlevi16 What's your CPU type? And what's the test case of your?

Could you provide detailed info of your case? like, test script, optimization setting.

Tlevi16 commented 3 years ago

by CPU, do u mean my processor ??

NeoZhangJianyu commented 3 years ago

Yes. Tensorflow MKL is optimized for Intel CPU. This issue is about performance issue on Intel CPU only.

Tlevi16 commented 3 years ago

if yes, it's INTEL(R) CORE(TM) i5- 8250U CPU @ 1.60 GHz 1.80 GHz

Tlevi16 commented 3 years ago

and 64 bit operating system, windows 10 home

NeoZhangJianyu commented 3 years ago

Yes. It's OK! Could you share your test script or more detailed info about your test?

I try to reproduce your case in my PC. Then try to find the solution for your case.

Tlevi16 commented 3 years ago

`import tensorflow as tf import numpy as np

a = tf.constant(np.array([1., 2., 3.])) b = tf.constant(np.array([.4, .5, .6])) c = tf.tensordot(a, b, 1)

output = c.numpy() `

but when I run this program, I get the error :-

I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Tlevi16 commented 3 years ago

what should I do now ????????

Tlevi16 commented 3 years ago

pleaseee help me

NeoZhangJianyu commented 3 years ago

@Tlevi16 I believe you have met the performance issue in Tensorflow 2.3.0 MKL. I'd like to answer your question in this issue.

For above error in your simple case, it's another issue. Is it possible you create another github issue to track it?

I guess your Tensorflow running environment is changed recently.

Tlevi16 commented 3 years ago

I am new to tensorflow i just started it today, yeah I changed my tensorflow running environment

Tlevi16 commented 3 years ago

pls give a solution

Tlevi16 commented 3 years ago

to my problem

Tlevi16 commented 3 years ago

ya I'll create another github issue @NeoZhangJianyu , what should I name it as ??

Tlevi16 commented 3 years ago

pls reply

NeoZhangJianyu commented 3 years ago

@Tlevi16 Welcome to Tensorflow world!

Is it possible to create a new issue of your case? We want to keep this issue to focus on the original issue. The discussion will help thousands of developers for same issue.

Thank you!

Tlevi16 commented 3 years ago

ok but what can I name it as ??

NeoZhangJianyu commented 3 years ago

I just test your case. It's passed.

The warning is just warning. Your CPU supports AVX2. You could ignore it.

Tlevi16 commented 3 years ago

ok thx

sushreebarsa commented 3 years ago

@gmatalongthewatchtower Could you please try on latest stable version of TF 2.5 and let us know if this is still an issue.Thanks!

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 3 years ago

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

chuangzhidan commented 2 years ago

亲,有没有建议的深度学习加速库,step-by—step的指南?ondnn,openvino都没弄成,常常参考资料过时缺东少西。(linux系统)不胜感激

@gmatalongthewatchtower 1. The optimization setting is depended on your system (HW, OS, SW). I try the setting in Intel Xeon CPU, Linux. It maybe not adapt to your system. Please tuning the values to get better performance. You could check if the CPU is full when running.

  1. The newer is better.
  2. Tensorflow support Intel GPU is coming. By now, you could use OpenVINO for inference, accelerated by Intel Integrated GPU.

亲,有没有建议的深度学习加速库,step-by—step的指南?ondnn,openvino都没弄成,常常参考资料过时缺东少西。(linux系统)不胜感激

NeoZhangJianyu commented 2 years ago

@chuangzhidan Could you create a new github issue for your question? This issue is already closed. It's OK to ask your question in Chinese. (topic should be in English)

Thank you!

Dreamcouple commented 2 years ago

@gmatalongthewatchtower In windows, set MKLDNN_VERBOSE=1

how to set that? in windows