keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.9k stars 19.45k forks source link

Using tensorflow backend with Metal GPU #18436

Closed dkgaraujo closed 5 months ago

dkgaraujo commented 1 year ago

Adjusting the functioning example code in the Apple instructions for Metal (MPS) acceleration with tensorflow to a keras_core implementation as shown below, the code snippet ceases to function. Code and error message provided below:

import tensorflow as tf
import keras_core as keras
cifar = keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = keras.applications.ResNet50(
    include_top=True,
    weights=None,
    input_shape=(32, 32, 3),
    classes=100,)

loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)
2023-07-20 00:31:02.310030: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Using TensorFlow backend
2023-07-20 00:31:05.774093: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-07-20 00:31:05.774135: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Epoch 1/5
2023-07-20 00:31:22.368539: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-07-20 00:31:31.363243: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-07-20 00:31:34.679537: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at xla_ops.cc:503 : NOT_FOUND: could not find registered platform with id: 0x7fa8bfb32150
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?f9f12993-be28-47e7-9018-6e7c9e4327ab)
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
pyth/to/my/notebook.ipynb Cell 3 in <module>
     [11](vscode-notebook-pyth/to/my/notebook.ipynb#X16sZmlsZQ%3D%3D?line=10) loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=False)
     [12](vscode-notebook-pyth/to/my/notebook.ipynb#X16sZmlsZQ%3D%3D?line=11) model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
---> [13](vscode-notebook-pyth/to/my/notebook.ipynb#X16sZmlsZQ%3D%3D?line=12) model.fit(x_train, y_train, epochs=5, batch_size=64)

File path/to/my/venv/lib/python3.10/site-packages/keras_core/src/utils/traceback_utils.py:123, in filter_traceback.<locals>.error_handler(*args, **kwargs)
    120     filtered_tb = _process_traceback_frames(e.__traceback__)
    121     # To get the full stack trace, call:
    122     # `keras_core.config.disable_traceback_filtering()`
--> 123     raise e.with_traceback(filtered_tb) from None
    124 finally:
    125     del filtered_tb

File path/to/my/venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     51 try:
     52   ctx.ensure_initialized()
---> 53   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     54                                       inputs, attrs, num_outputs)
     55 except core._NotOkStatusException as e:
     56   if name is not None:

NotFoundError: Graph execution error:
...
    File "path/to/my/venv/lib/python3.10/site-packages/keras_core/src/backend/tensorflow/trainer.py", line 112, in one_step_on_iterator
      outputs = self.distribute_strategy.run(
Node: 'StatefulPartitionedCall'
could not find registered platform with id: 0x7fa8bfb32150
     [[{{node StatefulPartitionedCall}}]] [Op:__inference_one_step_on_iterator_47058]
fchollet commented 1 year ago

"could not find registered platform with id" typically means that your TF runtime is not working. You can try to call keras.config.disable_traceback_filtering() and run the code again to find out more.

I'm able to run on M1/M2 by just installing tensorflow-metal and tensorflow-macos.

dkgaraujo commented 1 year ago

Thanks. On Intel silicon, the tensorflow version runs as expected, but adapting it to Keras Core (by removing the tf import and the tf namespace, no further adjustment) yields the same error, even after disabling the traceback filtering.

sachinprasadhs commented 6 months ago

The code runs without any issues on my M1. You can directly install tensorflow , since the release officially supports MacOS release build. Keras 3 is available now, you can install it as well and set your backend.

To install the packages:

pip install -U tensorflow
pip install -U keras

below code works well.

import keras
cifar = keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = keras.applications.ResNet50(
    include_top=True,
    weights=None,
    input_shape=(32, 32, 3),
    classes=100,)

loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)
github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 5 months ago

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 5 months ago

Are you satisfied with the resolution of your issue? Yes No