tensorflow / swift-apis

Swift for TensorFlow Deep Learning Library
Apache License 2.0
794 stars 133 forks source link

TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA #935

Closed RahulBhalley closed 3 years ago

RahulBhalley commented 4 years ago

For the following code in Xcode on S4TF 0.9 rc

let device = Device(kind: .CPU, ordinal: 0, backend: .XLA)
let tensor = Tensor<Float>(randomNormal: [1, 512, 512, 3], on: Device.defaultXLA)
print(tensor.device)

I get following output when printing device information of tensor

2020-05-06 16:27:31.136129: I tensorflow/compiler/xla/xla_client/xrt_local_service.cc:54] Peer localservice 1 {localhost:31603}
2020-05-06 16:27:31.137447: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
2020-05-06 16:27:31.173632: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x11b052eb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-06 16:27:31.173661: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-06 16:27:31.177647: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:31603}
2020-05-06 16:27:31.177988: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:390] Started server with target: grpc://localhost:31603
2020-05-06 16:27:31.198992: W tensorflow/compiler/jit/xla_device.cc:398] XLA_GPU and XLA_CPU devices are deprecated and will be removed in subsequent releases. Instead, use either @tf.function(experimental_compile=True) for must-compile semantics, or run with TF_XLA_FLAGS=--tf_xla_auto_jit=2 for auto-clustering best-effort compilation.
Device(kind: .CPU, ordinal: 0, backend: .XLA)

How can I compile S4TF with these configuration. I'm wondering it'll give a speedup to operations on my macOS, right?

ematejska commented 4 years ago

Saleem, could you take a look?

ematejska commented 4 years ago

@asuhan @pschuh Any idea about this?

pschuh commented 4 years ago

I think this is a standard error based on the config options of the tensorflow build itself. I don't think it is related to xla at all (Only applies to tf-eager) You need to find the right flags to add here: https://github.com/tensorflow/swift-apis/blob/master/CMakeLists.txt#L62

compnerd commented 4 years ago

There’s no error there. The tensorflow warning about the features are benign and expected AFAIK. I think that @asuhan is probably the best person to respond to that.

RahulBhalley commented 3 years ago

Closing this issue because it looks the issue has been fixed.

2021-01-06 09:19:13.424664: I tensorflow/compiler/xla/xla_client/xrt_local_service.cc:54] Peer localservice 1 {localhost:31063}
2021-01-06 09:19:13.425697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-06 09:19:13.463459: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x11b78fec0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-06 09:19:13.463491: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-06 09:19:13.466739: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job localservice -> {0 -> localhost:31063}
2021-01-06 09:19:13.467019: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:405] Started server with target: grpc://localhost:31063
2021-01-06 09:19:13.467283: I tensorflow/compiler/xla/xla_client/computation_client.cc:202] NAME: CPU:0
2021-01-06 09:19:13.529850: I tensorflow/compiler/jit/xla_device.cc:398] XLA_GPU and XLA_CPU devices are deprecated and will be removed in subsequent releases. Instead, use either @tf.function(experimental_compile=True) for must-compile semantics, or run with TF_XLA_FLAGS=--tf_xla_auto_jit=2 for auto-clustering best-effort compilation.
Device(kind: .CPU, ordinal: 0, backend: .XLA)