tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.29k stars 74.31k forks source link

Tensorflow GPU segfaults on M1 mac #60395

Open ashok-arora opened 1 year ago

ashok-arora commented 1 year ago
Click to expand! ### Issue Type Bug ### Have you reproduced the bug with TF nightly? No ### Source binary ### Tensorflow Version 2.12.0 ### Custom Code Yes ### OS Platform and Distribution M1 mac, Ventura OS ### Mobile device _No response_ ### Python version 3.10 ### Bazel version _No response_ ### GCC/Compiler version _No response_ ### CUDA/cuDNN version _No response_ ### GPU model and memory _No response_ ### Current Behaviour? Fatal Python error: Segmentation fault. ### Standalone code to reproduce the issue ```shell # Download script curl https://raw.githubusercontent.com/abreheret/tensorflow-models/master/tutorials/image/mnist/convolutional.py -o model.py # Make script compatible with Tensorflow 2.0 sed -i 's/import tensorflow as tf/import tensorflow.compat.v1 as tf\ntf.disable_eager_execution()/g' model.py # Run script python model.py ``` ### Relevant log output ```shell Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes. Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes. Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes. Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes. Extracting data/train-images-idx3-ubyte.gz Extracting data/train-labels-idx1-ubyte.gz Extracting data/t10k-images-idx3-ubyte.gz Extracting data/t10k-labels-idx1-ubyte.gz Metal device set to: Apple M1 systemMemory: 16.00 GB maxCacheSize: 5.33 GB 2023-04-21 23:54:30.086459: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Initialized! Fatal Python error: Segmentation fault Thread 0x00000001eb19a500 (most recent call first): File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1454 in _call_tf_sessionrun File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1361 in _run_fn File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1378 in _do_call File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1371 in _do_run File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1191 in _run File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 968 in run File "/Users/ashok/Desktop/model.py", line 314 in main File "/Users/ashok/miniconda/lib/python3.10/site-packages/absl/app.py", line 254 in _run_main File "/Users/ashok/miniconda/lib/python3.10/site-packages/absl/app.py", line 308 in run File "/Users/ashok/miniconda/lib/python3.10/site-packages/tensorflow/python/platform/app.py", line 36 in run File "/Users/ashok/Desktop/model.py", line 353 in Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, google._upb._message, tensorflow.python.framework.fast_tensor_util, _cffi_backend, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, PIL._imaging, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy.ndimage._nd_image, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, _ni_label, scipy.ndimage._ni_label (total: 114) fish: Job 1, 'python ~/Desktop/model.py' terminated by signal SIGSEGV (Address boundary error) ```
saurabhmj11 commented 1 year ago

solutions that you can try:

Ensure that you have installed the latest version of TensorFlow for Apple Silicon processors. You can download the latest version from the official TensorFlow website or install it via pip.

Try running your code with an earlier version of TensorFlow. This issue might be specific to TensorFlow 2.12.0. So, try running your code with TensorFlow 2.11.0 or 2.13.0.

Check if there is any issue with your custom code. You can try running the TensorFlow MNIST example code without any modification to check if the issue is with TensorFlow or your code.

Check if there are any compatibility issues with the Python version that you have installed. You can try running your code with Python 3.8 or 3.9.

Try updating your Ventura OS and Xcode version to the latest stable version.

I hope this helps. If the issue persists, please provide more details about the error, including the full stack trace, the GPU model and memory, and other relevant information.

sampathweb commented 1 year ago

@ashok-arora - Can you confirm that you followed the steps here for installing TensorFlow on Mac M1? - https://developer.apple.com/metal/tensorflow-plugin/ . This is the suggested installation from TensorFlow Docs for installing on M1 (https://www.tensorflow.org/install/pip#macos)

I would also recommend that you create a new Conda environment and install inside there to isolate the issue.

ashok-arora commented 1 year ago

@sampathweb Yes, I have followed the developer guide for installing TensorFlow on mac M1 and had created a new conda environment.

SuryanarayanaY commented 1 year ago

@ashok-arora ,

SInce you are using M1 apple silicon(arm architecture) you need to follow the metal plugin instructions from Apple which are here. The instructions in Tensorflow forum can be work for Macos with intel chips(X86-64 architecture).

Can you please also confirm whether you have installed tensorflow-macos(Apple package) or tensorflow(TF package)? pip install tensorflow-macos fetches Apple wheel and pip install tensorflow fetches TF wheel but this will work for X86_64 architecture only except for tf-nightly version. You can use pip install tf-nightly which can install tf-macos-nightly wheel of Apple based on systems architecture. Please try with tf-nightly version also let us know if same behaviour observed in nightly also.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

ashok-arora commented 1 year ago

@SuryanarayanaY Yes, I have installed it through the given instructions.

sachinprasadhs commented 1 year ago

Hi @ashok-arora , Could you please try to install Tensorflow 2.13 using pip for Apple M1 chips.

!pip install tensorflow==2.13.0

import tensorflow as tf
print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

Above command will install the CPU version, for GPU support use the below command and your system will detect both CPU and GPU.

!pip install tesnsorflow-metal

import tensorflow as tf
print(tf.config.list_physical_devices())
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Let me know if you face any issues. Thanks!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No

ashok-arora commented 1 year ago

How can I reopen the issue?