tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.11k stars 709 forks source link

M1 Apple machine support #5804

Open EdwardCuiPeacock opened 1 year ago

EdwardCuiPeacock commented 1 year ago

Currently, TFX hard codes the tensorflow version here https://github.com/tensorflow/tfx/blob/master/tfx/dependencies.py, which specifically uses tensorflow package. However, tensorflow is not supported on M1 Apple silicon; users need to install tensorflow-macos in order to run tensorflow (does not change Python imports in scripts). Tensorflow has yet to support arm64 wheels for M1 Apple silicon (see this issue: https://github.com/tensorflow/tensorflow/issues/57185). So at the meantime, it will be nice to offer certain platform-dependent flexibility that allows the M1 silicon users to use tensorflow-macos as depedency.

singhniraj08 commented 1 year ago

@EdwardCuiPeacock,

This feature request is already in progress and team is working on priority for this, but there are internal infrastructure reasons for why this has proceeded slowly.

Rest assured, You can follow this TF forum thread for updates and we will update this thread once we have some updates.

Thank you!

EdwardCuiPeacock commented 1 year ago

Thank you for the response. @singhniraj08 Looking forward to being able to do fully local development on my new machine.

yingding commented 1 year ago

+1 , thanks for the great work. The TF 2.13 now supports apple silicon natively (https://blog.tensorflow.org/2023/07/whats-new-in-tensorflow-213-and-keras-213.html). I am looking forward to the TFX support on apple silicon soon.

tangm commented 10 months ago

There's been some PRs relevant to this and I've written up my experience trying to get tfx 1.14.0 working on apple silicon natively here

The relevant pull requests are:

EDIT: wrong link, add issue links

e-compagno commented 5 months ago

In may 2024 still it doesn't look tfx is supported by Apple silicon processor. Is there any update on this?

axeltidemann commented 4 months ago

@tangm I followed the instructions on your blog post to install TFX 1.14.0, but I encountered an error. I left a comment on the post itself, but posting here in case you missed the notification, and in case others want to try as well.

axeltidemann commented 4 months ago

I was able to install TFX 1.14 on my Mac using @tangm's instructions as a starting point (thanks!). I had to do some things differently in order to make it work on my machine. This was done on an M1 Pro MacBook Pro, Sonoma 14.5, XCode 14.5, with Python 3.9 and 3.10.

Create a virtual environment:

python3 -m venv tfx-1.14.0
source tfx-1.14.0/bin/activate
pip install --upgrade pip wheel

Install TensorFlow:

pip install tensorflow==2.13.1 tensorflow-metal

Pin Bazel version (this due to ml-metadata):

export USE_BAZEL_VERSION=5.3.2

Install forked repos for M1.

Install ml-metadata. If you want to skip this step, you can download wheel files for Python 3.9 and 3.10.

git clone https://github.com/tangm/ml-metadata.git
cd ml-metadata
git checkout v1.14.0-m1fix
sed -i '' $'113i\\\n\'--host_copt=-Wno-error=incompatible-function-pointer-types\', ' setup.py
python setup.py bdist_wheel
pip install dist/ml_metadata-1.14.0-cp310-cp310-macosx_11_0_universal2.whl

Install tfx-bsl. If you want to skip this step, you can download wheel files for Python 3.9 and 3.10.

git clone https://github.com/tangm/tfx-bsl.git
cd tfx-bsl
git checkout r1.14.0-48-Allow-compilation-on-m1-macs
sed -i '' $'98i\\\n[\'--host_copt=-Wno-error=incompatible-function-pointer-types\'] + ' setup.py
python setup.py bdist_wheel
pip install dist/tfx_bsl-1.14.0-cp310-cp310-macosx_11_0_universal2.whl jsonschema==4.17.3 tensorflow==2.13.1

Install data-validation. If you want to skip this step, you can download wheel files for Python 3.9 and 3.10.

git clone https://github.com/tangm/data-validation.git
cd data-validation
git checkout r1.14.0-205-allow-apple-silicon
sed -i '' $'83i\\\n[\'--host_copt=-Wno-error=incompatible-function-pointer-types\'] + ' setup.py
python setup.py bdist_wheel
pip install dist/tensorflow_data_validation-1.14.0-cp310-cp310-macosx_11_0_universal2.whl

Install tfx.

pip install tfx==1.14.0 jsonschema==4.17.3

Avoid a known bug:

pip install --upgrade google-cloud-aiplatform "shapely<2"

Test that the TFX installation works with the penguin template:

export PIPELINE_NAME=sanity_check
export PROJECT_DIR=$PWD/$PIPELINE_NAME
tfx template copy \
  --pipeline_name="${PIPELINE_NAME}" \
  --destination_path="${PROJECT_DIR}" \
  --model=penguin
cd sanity_check
tfx pipeline create --engine=local --pipeline_path=local_runner.py
tfx run create --engine=local --pipeline_name="${PIPELINE_NAME}"

You can see in the Activity Monitor that the M1 is being used, as well as these lines in the terminal:

2024-07-11 16:35:10.028958: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro
2024-07-11 16:35:10.028993: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2024-07-11 16:35:10.029010: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2024-07-11 16:35:10.029613: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-07-11 16:35:10.029779: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)

The penguin template runs fine, but I tested it on another TFX pipeline I've written, and the memory usage is enormous. I had to reduce the train, validation and test sets to 100 samples each (!), and the memory usage ballooned to 12GB. It seems like it is not freed after the Transform component. Furthermore, I had to set tf.config.set_soft_device_placement(True) in my Trainer, because there is a missing Op:StatelessRandomGetKeyCounter for the Embedding layer. I've reported this to the Apple Developer Forum. So there's definitely more work to be done before this can do some serious work.

xixici commented 2 months ago

Great. It looks really good and I hope it can help others. @axeltidemann