Closed danieltomasz closed 3 weeks ago
Thanks a lot @danieltomasz for the very high quality report. @lucianopaz, do you have any thoughts regarding the BLAS selection mechanism?
One thing I learned & might be useful - numpy 2.2 installed via pip use accelerate, numpy 2.2 installed via the same conda installs with openblas (I checked this via numpy.show_config()) I installed it in separate env just to check, bc pytensor doesnt support yet numpy >= 2.0
That is indeed very interesting, thanks @danieltomasz.
The Conda dependency chain is:
conda-forge/osx-arm64/pytensor-2.25.4-py312h3f593ad_0.conda
→ accelerate
, blas
conda-forge/osx-arm64/blas-2.124-openblas.conda
→ blas-devel 3.9.0
blas-devel 3.9.0
→ openblas 0.3.27.*
One way to get more flexibility to help debug this is to instead use the pytensor-base
package on conda-forge
. That should allow us to specify accelerate
without installing openblas
. But you'll need to install your own C compilers as well.
@danieltomasz, does this give you something to experiment with? I don't have a Mac myself, so unfortunately I can't directly debug this.
When I force "libblas=*=*accelerate"
~/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12 -c conda-forge pytensor "libblas=*=*accelerate"
Channels:
- conda-forge
- defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12
added / updated specs:
- libblas[build=*accelerate]
- pytensor
The following NEW packages will be INSTALLED:
accelerate conda-forge/noarch::accelerate-0.34.2-pyhd8ed1ab_0
blas conda-forge/osx-arm64::blas-2.124-accelerate
blas-devel conda-forge/osx-arm64::blas-devel-3.9.0-24_osxarm64_accelerate
brotli-python conda-forge/osx-arm64::brotli-python-1.1.0-py312hde4cb15_2
bzip2 conda-forge/osx-arm64::bzip2-1.0.8-h99b78c6_7
ca-certificates conda-forge/osx-arm64::ca-certificates-2024.8.30-hf0a4a13_0
cctools_osx-arm64 conda-forge/osx-arm64::cctools_osx-arm64-1010.6-h4208deb_1
certifi conda-forge/noarch::certifi-2024.8.30-pyhd8ed1ab_0
cffi conda-forge/osx-arm64::cffi-1.17.1-py312h0fad829_0
charset-normalizer conda-forge/noarch::charset-normalizer-3.3.2-pyhd8ed1ab_0
clang conda-forge/osx-arm64::clang-17.0.6-default_h360f5da_7
clang-17 conda-forge/osx-arm64::clang-17-17.0.6-default_h146c034_7
clang_impl_osx-ar~ conda-forge/osx-arm64::clang_impl_osx-arm64-17.0.6-he47c785_19
clang_osx-arm64 conda-forge/osx-arm64::clang_osx-arm64-17.0.6-h54d7cd3_19
clangxx conda-forge/osx-arm64::clangxx-17.0.6-default_h360f5da_7
clangxx_impl_osx-~ conda-forge/osx-arm64::clangxx_impl_osx-arm64-17.0.6-h50f59cd_19
clangxx_osx-arm64 conda-forge/osx-arm64::clangxx_osx-arm64-17.0.6-h54d7cd3_19
colorama conda-forge/noarch::colorama-0.4.6-pyhd8ed1ab_0
compiler-rt conda-forge/osx-arm64::compiler-rt-17.0.6-h856b3c1_2
compiler-rt_osx-a~ conda-forge/noarch::compiler-rt_osx-arm64-17.0.6-h832e737_2
cons conda-forge/noarch::cons-0.4.6-pyhd8ed1ab_0
etuples conda-forge/noarch::etuples-0.3.9-pyhd8ed1ab_0
filelock conda-forge/noarch::filelock-3.16.1-pyhd8ed1ab_0
fsspec conda-forge/noarch::fsspec-2024.9.0-pyhff2d567_0
gmp conda-forge/osx-arm64::gmp-6.3.0-h7bae524_2
gmpy2 conda-forge/osx-arm64::gmpy2-2.1.5-py312h87fada9_2
h2 conda-forge/noarch::h2-4.1.0-pyhd8ed1ab_0
hpack conda-forge/noarch::hpack-4.0.0-pyh9f0ad1d_0
huggingface_hub conda-forge/noarch::huggingface_hub-0.25.1-pyhd8ed1ab_0
hyperframe conda-forge/noarch::hyperframe-6.0.1-pyhd8ed1ab_0
icu conda-forge/osx-arm64::icu-75.1-hfee45f7_0
idna conda-forge/noarch::idna-3.10-pyhd8ed1ab_0
jinja2 conda-forge/noarch::jinja2-3.1.4-pyhd8ed1ab_0
ld64_osx-arm64 conda-forge/osx-arm64::ld64_osx-arm64-951.9-hc81425b_1
libabseil conda-forge/osx-arm64::libabseil-20240116.2-cxx17_h00cdb27_1
libblas conda-forge/osx-arm64::libblas-3.9.0-24_osxarm64_accelerate
libcblas conda-forge/osx-arm64::libcblas-3.9.0-24_osxarm64_accelerate
libclang-cpp17 conda-forge/osx-arm64::libclang-cpp17-17.0.6-default_h146c034_7
libcxx conda-forge/osx-arm64::libcxx-19.1.0-ha82da77_0
libcxx-devel conda-forge/osx-arm64::libcxx-devel-17.0.6-h86353a2_6
libexpat conda-forge/osx-arm64::libexpat-2.6.3-hf9b8971_0
libffi conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5
libgfortran conda-forge/osx-arm64::libgfortran-5.0.0-13_2_0_hd922786_3
libgfortran5 conda-forge/osx-arm64::libgfortran5-13.2.0-hf226fd6_3
libiconv conda-forge/osx-arm64::libiconv-1.17-h0d3ecfb_2
liblapack conda-forge/osx-arm64::liblapack-3.9.0-24_osxarm64_accelerate
liblapacke conda-forge/osx-arm64::liblapacke-3.9.0-24_osxarm64_accelerate
libllvm17 conda-forge/osx-arm64::libllvm17-17.0.6-h5090b49_2
libprotobuf conda-forge/osx-arm64::libprotobuf-4.25.3-hc39d83c_1
libsqlite conda-forge/osx-arm64::libsqlite-3.46.1-hc14010f_0
libtorch conda-forge/osx-arm64::libtorch-2.4.0-cpu_generic_h4365fe2_1
libuv conda-forge/osx-arm64::libuv-1.49.0-hd74edd7_0
libxml2 conda-forge/osx-arm64::libxml2-2.12.7-h01dff8b_4
libzlib conda-forge/osx-arm64::libzlib-1.3.1-hfb2fe0b_1
llvm-openmp conda-forge/osx-arm64::llvm-openmp-18.1.8-hde57baf_1
llvm-tools conda-forge/osx-arm64::llvm-tools-17.0.6-h5090b49_2
logical-unificati~ conda-forge/noarch::logical-unification-0.4.6-pyhd8ed1ab_0
macosx_deployment~ conda-forge/noarch::macosx_deployment_target_osx-arm64-11.0-h6553868_1
markupsafe conda-forge/osx-arm64::markupsafe-2.1.5-py312h024a12e_1
minikanren conda-forge/noarch::minikanren-1.0.3-pyhd8ed1ab_0
mpc conda-forge/osx-arm64::mpc-1.3.1-h8f1351a_1
mpfr conda-forge/osx-arm64::mpfr-4.2.1-hb693164_3
mpmath conda-forge/noarch::mpmath-1.3.0-pyhd8ed1ab_0
multipledispatch conda-forge/noarch::multipledispatch-0.6.0-pyhd8ed1ab_1
ncurses conda-forge/osx-arm64::ncurses-6.5-h7bae524_1
networkx conda-forge/noarch::networkx-3.3-pyhd8ed1ab_1
nomkl conda-forge/noarch::nomkl-1.0-h5ca1d4c_0
numpy conda-forge/osx-arm64::numpy-1.26.4-py312h8442bc7_0
openssl conda-forge/osx-arm64::openssl-3.3.2-h8359307_0
packaging conda-forge/noarch::packaging-24.1-pyhd8ed1ab_0
pip conda-forge/noarch::pip-24.2-pyh8b19718_1
psutil conda-forge/osx-arm64::psutil-6.0.0-py312h024a12e_1
pycparser conda-forge/noarch::pycparser-2.22-pyhd8ed1ab_0
pysocks conda-forge/noarch::pysocks-1.7.1-pyha2e5f31_6
pytensor conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0
pytensor-base conda-forge/osx-arm64::pytensor-base-2.25.4-py312h02baea5_0
python conda-forge/osx-arm64::python-3.12.6-h739c21a_1_cpython
python_abi conda-forge/osx-arm64::python_abi-3.12-5_cp312
pytorch conda-forge/osx-arm64::pytorch-2.4.0-cpu_generic_py312h6bd8f41_1
pyyaml conda-forge/osx-arm64::pyyaml-6.0.2-py312h024a12e_1
readline conda-forge/osx-arm64::readline-8.2-h92ec313_1
requests conda-forge/noarch::requests-2.32.3-pyhd8ed1ab_0
safetensors conda-forge/osx-arm64::safetensors-0.4.5-py312he431725_0
scipy conda-forge/osx-arm64::scipy-1.14.1-py312heb3a901_0
setuptools conda-forge/noarch::setuptools-75.1.0-pyhd8ed1ab_0
sigtool conda-forge/osx-arm64::sigtool-0.1.3-h44b9a77_0
six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0
sleef conda-forge/osx-arm64::sleef-3.7-h7783ee8_0
sympy conda-forge/noarch::sympy-1.13.3-pypyh2585a3b_103
tapi conda-forge/osx-arm64::tapi-1300.6.5-h03f4b80_0
tk conda-forge/osx-arm64::tk-8.6.13-h5083fa2_1
toolz conda-forge/noarch::toolz-0.12.1-pyhd8ed1ab_0
tqdm conda-forge/noarch::tqdm-4.66.5-pyhd8ed1ab_0
typing-extensions conda-forge/noarch::typing-extensions-4.12.2-hd8ed1ab_0
typing_extensions conda-forge/noarch::typing_extensions-4.12.2-pyha770c72_0
tzdata conda-forge/noarch::tzdata-2024a-h8827d51_1
urllib3 conda-forge/noarch::urllib3-2.2.3-pyhd8ed1ab_0
wheel conda-forge/noarch::wheel-0.44.0-pyhd8ed1ab_0
xz conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0
yaml conda-forge/osx-arm64::yaml-0.2.5-h3422bc3_2
zstandard conda-forge/osx-arm64::zstandard-0.23.0-py312h15fbf35_1
zstd conda-forge/osx-arm64::zstd-1.5.6-hb46c0d2_0
It install with success, but gives me errors (segmentation faults)
from `python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")`
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/vm.py", line 1227, in make_all
node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1627, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 1255, in module_from_key
module = lnk.compile_cmodule(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1528, in compile_cmodule
module = c_compiler.compile_str(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 2654, in compile_str
raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/Users/daniel/.pyenv/versions/voxel-bayes-3.12/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -fPIC -undefined dynamic_lookup -I/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/core/include -I/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include/python3.12 -I/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/c_code -L/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib -fvisibility=hidden -o /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64/tmp4ndb3uui/mbe23404cc39ec1a668b1ae18701f267b8ee61fabc03b6968263aa4f888d9dec6.so /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64/tmp4ndb3uui/mod.cpp
clang++: error: unable to execute command: Segmentation fault: 11
clang++: error: linker command failed due to signal (use -v to see invocation)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/misc/check_blas.py", line 274, in <module>
t, impl = execute(
^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/misc/check_blas.py", line 57, in execute
f = pytensor.function([], updates=[(c, 0.4 * c + 0.8 * dot(a, b))])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/compile/function/__init__.py", line 318, in function
fn = pfunc(
^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/compile/function/pfunc.py", line 465, in pfunc
return orig_function(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/compile/function/types.py", line 1762, in orig_function
fn = m.create(defaults)
^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/compile/function/types.py", line 1654, in create
_fn, _i, _o = self.linker.make_thunk(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/basic.py", line 245, in make_thunk
return self.make_all(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/vm.py", line 1236, in make_all
raise_with_op(fgraph, node)
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/utils.py", line 524, in raise_with_op
raise exc_value.with_traceback(exc_trace)
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/vm.py", line 1227, in make_all
node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1627, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 1255, in module_from_key
module = lnk.compile_cmodule(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1528, in compile_cmodule
module = c_compiler.compile_str(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 2654, in compile_str
raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/Users/daniel/.pyenv/versions/voxel-bayes-3.12/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -fPIC -undefined dynamic_lookup -I/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/core/include -I/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include/python3.12 -I/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/pytensor/link/c/c_code -L/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib -fvisibility=hidden -o /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64/tmp4ndb3uui/mbe23404cc39ec1a668b1ae18701f267b8ee61fabc03b6968263aa4f888d9dec6.so /Users/daniel/.pytensor/compiledir_macOS-15.0-arm64-arm-64bit-arm-3.12.6-64/tmp4ndb3uui/mod.cpp
clang++: error: unable to execute command: Segmentation fault: 11
clang++: error: linker command failed due to signal (use -v to see invocation)
Apply node that caused the error: Gemm{inplace}(<Matrix(float64, shape=(?, ?))>, 0.8, <Matrix(float64, shape=(?, ?))>, <Matrix(float64, shape=(?, ?))>, 0.4)
Toposort index: 0
Inputs types: [TensorType(float64, shape=(None, None)), TensorType(float64, shape=()), TensorType(float64, shape=(None, None)), TensorType(float64, shape=(None, None)), TensorType(float64, shape=())]
HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
zsh: command not found: from
Some of those error are discussed here https://discourse.pymc.io/t/environment-not-working-anymore-on-macos/14210
@danieltomasz, could you please try using the pytensor-base
package instead of pytensor
?
/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12 -c conda-forge pytensor-base
Channels:
- conda-forge
- defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12
added / updated specs:
- pytensor-base
The following NEW packages will be INSTALLED:
bzip2 conda-forge/osx-arm64::bzip2-1.0.8-h99b78c6_7
ca-certificates conda-forge/osx-arm64::ca-certificates-2024.8.30-hf0a4a13_0
cons conda-forge/noarch::cons-0.4.6-pyhd8ed1ab_0
etuples conda-forge/noarch::etuples-0.3.9-pyhd8ed1ab_0
filelock conda-forge/noarch::filelock-3.16.1-pyhd8ed1ab_0
libblas conda-forge/osx-arm64::libblas-3.9.0-24_osxarm64_openblas
libcblas conda-forge/osx-arm64::libcblas-3.9.0-24_osxarm64_openblas
libcxx conda-forge/osx-arm64::libcxx-19.1.0-ha82da77_0
libexpat conda-forge/osx-arm64::libexpat-2.6.3-hf9b8971_0
libffi conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5
libgfortran conda-forge/osx-arm64::libgfortran-5.0.0-13_2_0_hd922786_3
libgfortran5 conda-forge/osx-arm64::libgfortran5-13.2.0-hf226fd6_3
liblapack conda-forge/osx-arm64::liblapack-3.9.0-24_osxarm64_openblas
libopenblas conda-forge/osx-arm64::libopenblas-0.3.27-openmp_h517c56d_1
libsqlite conda-forge/osx-arm64::libsqlite-3.46.1-hc14010f_0
libzlib conda-forge/osx-arm64::libzlib-1.3.1-hfb2fe0b_1
llvm-openmp conda-forge/osx-arm64::llvm-openmp-18.1.8-hde57baf_1
logical-unificati~ conda-forge/noarch::logical-unification-0.4.6-pyhd8ed1ab_0
minikanren conda-forge/noarch::minikanren-1.0.3-pyhd8ed1ab_0
multipledispatch conda-forge/noarch::multipledispatch-0.6.0-pyhd8ed1ab_1
ncurses conda-forge/osx-arm64::ncurses-6.5-h7bae524_1
numpy conda-forge/osx-arm64::numpy-1.26.4-py312h8442bc7_0
openssl conda-forge/osx-arm64::openssl-3.3.2-h8359307_0
pip conda-forge/noarch::pip-24.2-pyh8b19718_1
pytensor-base conda-forge/osx-arm64::pytensor-base-2.25.4-py312h02baea5_0
python conda-forge/osx-arm64::python-3.12.6-h739c21a_1_cpython
python_abi conda-forge/osx-arm64::python_abi-3.12-5_cp312
readline conda-forge/osx-arm64::readline-8.2-h92ec313_1
scipy conda-forge/osx-arm64::scipy-1.14.1-py312heb3a901_0
setuptools conda-forge/noarch::setuptools-75.1.0-pyhd8ed1ab_0
six conda-forge/noarch::six-1.16.0-pyh6c4a22f_0
tk conda-forge/osx-arm64::tk-8.6.13-h5083fa2_1
toolz conda-forge/noarch::toolz-0.12.1-pyhd8ed1ab_0
tzdata conda-forge/noarch::tzdata-2024a-h8827d51_1
wheel conda-forge/noarch::wheel-0.44.0-pyhd8ed1ab_0
xz conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0
Ok, so that's installing numpy with openblas. And what happens now if you try and force accelerate
?
@maresb as I wrote here it install fine, but when trying to test it give segmentation error https://github.com/pymc-devs/pytensor/issues/1005#issuecomment-2380841854
Also the output from numpy even if I force accelerate in conda ~/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12 -c conda-forge pytensor-base "libblas=*=*accelerate"
>>> import numpy as np
>>> np.show_config()
/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "16.0.6",
"commands": "arm64-apple-darwin20.0.0-clang",
"args": "-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.8",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "16.0.6",
"commands": "arm64-apple-darwin20.0.0-clang++",
"args": "-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0"
}
},
"Machine Information": {
"host": {
"cpu": "arm64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"cross-compiled": true
},
"Build Dependencies": {
"blas": {
"name": "blas",
"found": true,
"version": "3.9.0",
"detection method": "pkgconfig",
"include directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include",
"lib directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib",
"openblas configuration": "unknown",
"pc file directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig"
},
"lapack": {
"name": "dep4569863840",
"found": true,
"version": "1.26.4",
"detection method": "internal",
"include directory": "unknown",
"lib directory": "unknown",
"openblas configuration": "unknown",
"pc file directory": "unknown"
}
},
"Python Information": {
"path": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
When installing only numpy with forced accelerate
Python 3.12.6 | packaged by conda-forge | (main, Sep 22 2024, 14:07:06) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> print(np.__version__)
2.1.1
>>> np.show_config()
/Users/daniel/.pyenv/versions/voxel-bayes-3.12/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "17.0.6",
"commands": "arm64-apple-darwin20.0.0-clang",
"args": "-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.11",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "17.0.6",
"commands": "arm64-apple-darwin20.0.0-clang++",
"args": "-ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0",
"linker args": "-Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -L/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1725411805471/work=/usr/local/src/conda/numpy-2.1.1, -fdebug-prefix-map=/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12=/usr/local/src/conda-prefix, -D_FORTIFY_SOURCE=2, -isystem, /Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include, -mmacosx-version-min=11.0, -mmacosx-version-min=11.0"
}
},
"Machine Information": {
"host": {
"cpu": "arm64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"cross-compiled": true
},
"Build Dependencies": {
"blas": {
"name": "blas",
"found": true,
"version": "3.9.0",
"detection method": "pkgconfig",
"include directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include",
"lib directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib",
"openblas configuration": "unknown",
"pc file directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig"
},
"lapack": {
"name": "lapack",
"found": true,
"version": "3.9.0",
"detection method": "pkgconfig",
"include directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/include",
"lib directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib",
"openblas configuration": "unknown",
"pc file directory": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/lib/pkgconfig"
}
},
"Python Information": {
"path": "/Users/daniel/.pyenv/versions/miniconda3-3.12-24.7.1-0/envs/voxel-bayes-3.12/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
I need to leave, but I might try something out of box, like if installing pytensor via "pixi" pulls accelerate (there might be something particular to my setup how conda is linking and trying different package manager tool might help), maybe someone with Apple Sillicon can replicate in meantime
Thanks so much for all the diagnosis @danieltomasz!
For when you find some more time, I wonder if lower versions of NumPy might work? For example <2
?
Unfortunately neither pixi, nor changing pyhon version to 3.11 or asking for lower version of numpy provide accelarate libraries (it is openblas by default); When I installe numpy via pip it intalled numpy 2.2 with accelerate, but adding pytensor to this envioronment downgrade numpy to one' that is using openblas64
but adding pytensor to this envioronment downgrade numpy to one' that is using openblas64
Thanks @danieltomasz for getting back to me!
Are you able to find some earlier conda-forge version of numpy that works with accelerate on your system?
hi @maresb, I think conda and numpy worked fine earlier (the latest numpy version <2 is from february) , I cannot pinpoint exact moment, but I was probably update to MacOS 15 that changed things ~ 2 weeks ago (also recently I think that conda might changed clang compiler that it uses with the python it ships, but I am not sure about this);
what could be worth to see :
1) If someone with Apple SIllicon and still on MacOS 14 can install pytensor
with accelerate
2) If other people on MacOS 15 and Apple Silicon can reproduce this behaviour
It is just my intuition, but forcing blas to accelerate works, but it creates the create error when running due to problems with compilers on MacOS 15, but it works with openblas
Also there was updates to accelerate in MacOS 15 https://developer.apple.com/documentation/accelerate/blas/ and this discussion might be relevant https://github.com/conda-forge/blas-feedstock/issues/103 and here https://github.com/conda-forge/numpy-feedstock/issues/253
Quick update: when I install via pip
pip install -U --no-binary :all: numpy pytensor
numpy seems to use accelerate
but pytensor
fails to do so
Results of running
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
is below
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and computation)
GT 610 33.5s
Some PyTensor flags:
blas__ldflags=
compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.7 (main, Oct 11 2024, 01:24:59) [Clang 16.0.0 (clang-1600.0.26.3)]
sys.prefix= /Users/daniel/.pyenv/versions/3.12.7
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
blas:
detection method: system
found: true
include directory: unknown
lib directory: unknown
name: accelerate
openblas configuration: unknown
pc file directory: unknown
version: unknown
lapack:
detection method: internal
found: true
include directory: unknown
lib directory: unknown
name: dep4409437856
openblas configuration: unknown
pc file directory: unknown
version: 1.26.4
Compilers:
c:
commands: cc
linker: ld64
name: clang
version: 16.0.0
c++:
commands: c++
linker: ld64
name: clang
version: 16.0.0
cython:
commands: cython
linker: cython
name: cython
version: 3.0.11
Machine Information:
build:
cpu: aarch64
endian: little
family: aarch64
system: darwin
host:
cpu: aarch64
endian: little
family: aarch64
system: darwin
Python Information:
path: /Users/daniel/.pyenv/versions/3.12.7/bin/python3.12
version: '3.12'
SIMD Extensions:
baseline:
- NEON
- NEON_FP16
- NEON_VFPV4
- ASIMD
found:
- ASIMDHP
not found:
- ASIMDFHM
Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 13.11s on CPU (with direct PyTensor binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
I am now on MacOS 15.1
@danieltomasz, the phrase Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
is from old theano
and we haven't updated it. Currently, numpy's config information is a bit deprecated in light of the newer build chain that they use. For that reason, we had to rely on something different. To get a better picture of what's going on please checkout the branch from this PR and run the following:
import logging
logger = logging.getLogger("pytensor.link.c.cmodule")
logger.setLevel(logging.DEBUG)
import pytensor
After the last import, you should see all of the detailed logs from cmodule
. I would like to ask you to paste all the output you get here.
I would like to see what errors pytensor is running into when it tries to determine the default_blas_flags
. You'll see that pytensor
first tries to link against MKL (which will obviously fail on M* chips) and it should log some information about not finding the libraries. The important thing to me is what happens when it tries to find blas
and cblas
. Both of these should be importable from Mac's provided accelerate framework, via clang++
's search directories.
The environment wasn't completely clean, but I uninstalled pytensor
and numpy
and then installed it again via
pip install --no-binary :all: numpy git+https://github.com/pymc-devs/pytensor.git@b314ca67e841b6fc0aac5ea7b5bcc11700565b1e
Output from pytensor
DEBUG (pytensor.link.c.cmodule): Will search for BLAS libraries in the following directories:
/Library/Developer/CommandLineTools/usr/lib/clang/16
/Users/daniel/.pyenv/versions/3.12.7/lib
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with intel threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with GNU OpenMP threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking Lapack + blas
DEBUG (pytensor.link.c.cmodule): Required file 'lapack' not found
DEBUG (pytensor.link.c.cmodule): Required file lapack not found
DEBUG (pytensor.link.c.cmodule): Checking blas alone
DEBUG (pytensor.link.c.cmodule): Required file 'blas' not found
DEBUG (pytensor.link.c.cmodule): Required file blas not found
DEBUG (pytensor.link.c.cmodule): Checking openblas
DEBUG (pytensor.link.c.cmodule): Required file 'openblas' not found
DEBUG (pytensor.link.c.cmodule): Required file openblas not found
DEBUG (pytensor.link.c.cmodule): Failed to identify blas ldflags. Will leave them empty.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
And in the same session
>>> import numpy as np
>>> np.show_config()
Build Dependencies:
blas:
detection method: system
found: true
include directory: unknown
lib directory: unknown
name: accelerate
openblas configuration: unknown
pc file directory: unknown
version: unknown
lapack:
detection method: internal
found: true
include directory: unknown
lib directory: unknown
name: dep4409437856
openblas configuration: unknown
pc file directory: unknown
version: 1.26.4
Compilers:
c:
commands: cc
linker: ld64
name: clang
version: 16.0.0
c++:
commands: c++
linker: ld64
name: clang
version: 16.0.0
cython:
commands: cython
linker: cython
name: cython
version: 3.0.11
Machine Information:
build:
cpu: aarch64
endian: little
family: aarch64
system: darwin
host:
cpu: aarch64
endian: little
family: aarch64
system: darwin
Python Information:
path: /Users/daniel/.pyenv/versions/3.12.7/bin/python3.12
version: '3.12'
SIMD Extensions:
baseline:
- NEON
- NEON_FP16
- NEON_VFPV4
- ASIMD
found:
- ASIMDHP
not found:
- ASIMDFHM
Thanks @danieltomasz , the logs say that we couldn’t find a blas library in the search directories. I can think of a couple of dumb causes but I’ll have to ask you to run a couple of other tests.
pytensor.config.cxx
? Is it the system clang or is it the conda clang?cxx -print-search-dirs
? What directories do you get in the libraries entry? Is the conda env lib path included?blas
in the conda env lib directory? If there is, what’s the file name extension?pytensor.link.c.cmodule.try_blas_flags(["-framework", "Accelerate"])
and see if you get something?Hi @lucianopaz, thanks for all the comments!
I installed python in the above case via pyenv (cpython 3.12.7),
The result of 1 is pointing into pyenv shim Users/daniel/.pyenv/shims/clang++
❯ /Users/daniel/.pyenv/shims/clang++ --version
Apple clang version 16.0.0 (clang-1600.0.26.4)
Target: arm64-apple-darwin24.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
❯ /Users/daniel/.pyenv/shims/clang++ -print-search-dirs
programs: =/Library/Developer/CommandLineTools/usr/bin
libraries: =/Library/Developer/CommandLineTools/usr/lib/clang/16
Regarding 2 and 3
Earlier in this thread I was trying installing pytensor
via miniconda (also managed via pyenv) ; It was either installing openblas or when I was trying to force accelerate via
~/.pyenv/versions/miniconda3-3.12-24.7.1-0/bin/conda create -n voxel-bayes-3.12 -c conda-forge pytensor-base "libblas=*=*accelerate"
accelerate was installed but with the following error happens https://github.com/pymc-devs/pytensor/issues/1005#issuecomment-2380841854 this happens also with the newer version of the miniconda I checked if the reason might be my setup, but with pixi conda install I was getting similar errors
Regarding 4, in the pyenv installed cpython:
>>> pytensor.link.c.cmodule.try_blas_flag(["-framework", "Accelerate"])
'-framework Accelerate'
>>>
Would be great if any other person on Apple processor can confirm it, if this is pecular to my setup or something more general (I started to have this problem after update to MacOS 15, MacOS 15 ships accelerate with blas 3.11, I wonder if this might be a problem
That last thing that you tried means that we could add those flags as a check and Mac would link to Accelerate. I'll open a small patch PR so that you can try it out.
@danieltomasz, try this PR out. It should set blas__ldflags
to the Accelerate framework.
@lucianopaz seems promising
Python 3.12.7 (main, Oct 31 2024, 00:25:36) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger("pytensor.link.c.cmodule")
>>> logger.setLevel(logging.DEBUG)
>>> import pytensor
DEBUG (pytensor.link.c.cmodule): Will search for BLAS libraries in the following directories:
/Library/Developer/CommandLineTools/usr/lib/clang/16
/Users/daniel/.pyenv/versions/3.12.7/lib
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with intel threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with GNU OpenMP threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking Accelerate framework
INFO (pytensor.link.c.cmodule): g++ -march=native selected lines: ['"/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple arm64-apple-macosx15.0.0 -Wundef-prefix=TARGET_OS_ -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -Werror=implicit-function-declaration -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -mframe-pointer=non-leaf -fno-strict-return -ffp-contract=on -fno-rounding-math -funwind-tables=1 -fobjc-msgsend-selector-stubs -target-sdk-version=15.1 -fvisibility-inlines-hidden-static-local-var -fno-modulemap-allow-subdirectory-search -target-cpu apple-m1 -target-feature +neon -target-feature +v8.5a -target-feature +zcm -target-feature +zcz -target-abi darwinpcs -debugger-tuning=lldb -target-linker-version 1115.7.3 -v -fcoverage-compilation-dir=/Users/daniel -resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/16 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/usr/lib/clang/16/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -internal-externc-isystem /Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list -Wno-implicit-int-float-conversion -Wno-c99-designator -Wno-final-dtor-non-final-class -Wno-extra-semi-stmt -Wno-misleading-indentation -Wno-quoted-include-in-framework-header -Wno-implicit-fallthrough -Wno-enum-enum-conversion -Wno-enum-float-conversion -Wno-elaborated-enum-base -Wno-reserved-identifier -Wno-gnu-folding-constant -fdebug-compilation-dir=/Users/daniel -ferror-limit 19 -stack-protector 1 -fstack-check -mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fmax-type-align=16 -fcommon -clang-vendor-feature=+disableNonDependentMemberExprInCurrentInstantiation -fno-odr-hash-protocols -clang-vendor-feature=+enableAggressiveVLAFolding -clang-vendor-feature=+revert09abecef7bbf -clang-vendor-feature=+thisNoAlignAttr -clang-vendor-feature=+thisNoNullAttr -clang-vendor-feature=+disableAtImportPrivateFrameworkInImplementationError -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ default lines: ['"/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple arm64-apple-macosx15.0.0 -Wundef-prefix=TARGET_OS_ -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -Werror=implicit-function-declaration -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -mframe-pointer=non-leaf -fno-strict-return -ffp-contract=on -fno-rounding-math -funwind-tables=1 -fobjc-msgsend-selector-stubs -target-sdk-version=15.1 -fvisibility-inlines-hidden-static-local-var -fno-modulemap-allow-subdirectory-search -target-cpu apple-m1 -target-feature +v8.5a -target-feature +aes -target-feature +crc -target-feature +dotprod -target-feature +fp-armv8 -target-feature +fp16fml -target-feature +lse -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sha2 -target-feature +sha3 -target-feature +neon -target-feature +zcm -target-feature +zcz -target-feature +fullfp16 -target-abi darwinpcs -debugger-tuning=lldb -target-linker-version 1115.7.3 -v -fcoverage-compilation-dir=/Users/daniel -resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/16 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/usr/lib/clang/16/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -internal-externc-isystem /Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list -Wno-implicit-int-float-conversion -Wno-c99-designator -Wno-final-dtor-non-final-class -Wno-extra-semi-stmt -Wno-misleading-indentation -Wno-quoted-include-in-framework-header -Wno-implicit-fallthrough -Wno-enum-enum-conversion -Wno-enum-float-conversion -Wno-elaborated-enum-base -Wno-reserved-identifier -Wno-gnu-folding-constant -fdebug-compilation-dir=/Users/daniel -ferror-limit 19 -stack-protector 1 -fstack-check -mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fmax-type-align=16 -fcommon -clang-vendor-feature=+disableNonDependentMemberExprInCurrentInstantiation -fno-odr-hash-protocols -clang-vendor-feature=+enableAggressiveVLAFolding -clang-vendor-feature=+revert09abecef7bbf -clang-vendor-feature=+thisNoAlignAttr -clang-vendor-feature=+thisNoNullAttr -clang-vendor-feature=+disableAtImportPrivateFrameworkInImplementationError -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ -march=native equivalent flags: ['-march=apple-m1']
but with the above flag results of
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
are defaulting to the error I posted above
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and computation)
GT 610 33.5s
Some PyTensor flags:
blas__ldflags= -framework Accelerate -rpath /Users/daniel/.pyenv/versions/3.12.7/lib
compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.7 (main, Oct 31 2024, 00:25:36) [Clang 16.0.0 (clang-1600.0.26.4)]
sys.prefix= /Users/daniel/.pyenv/versions/3.12.7
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "cc",
"args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.8",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "c++",
"args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
}
},
"Machine Information": {
"host": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
}
},
"Build Dependencies": {
"blas": {
"name": "openblas64",
"found": true,
"version": "0.3.23.dev",
"detection method": "pkgconfig",
"include directory": "/opt/arm64-builds/include",
"lib directory": "/opt/arm64-builds/lib",
"openblas configuration": "USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3",
"pc file directory": "/usr/local/lib/pkgconfig"
},
"lapack": {
"name": "dep4335021056",
"found": true,
"version": "1.26.4",
"detection method": "internal",
"include directory": "unknown",
"lib directory": "unknown",
"openblas configuration": "unknown",
"pc file directory": "unknown"
}
},
"Python Information": {
"path": "/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-q69bfk1p/cp312-macosx_arm64/build/venv/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
Traceback (most recent call last):
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 428, in _ldflags
assert t0 == "-"
^^^^^^^^^
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/vm.py", line 1227, in make_all
node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1614, in cthunk_factory
key = self.cmodule_key()
^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1266, in cmodule_key
compile_args=self.compile_args(),
^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 947, in compile_args
ret += x.c_compile_args(c_compiler=c_compiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 496, in c_compile_args
return ldflags(libs=False, flags=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 359, in ldflags
return _ldflags(
^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 430, in _ldflags
raise ValueError(f'invalid token "{t}" in ldflags_str: "{ldflags_str}"')
ValueError: invalid token "Accelerate" in ldflags_str: "-framework Accelerate -rpath /Users/daniel/.pyenv/versions/3.12.7/lib"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/misc/check_blas.py", line 274, in <module>
t, impl = execute(
^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/misc/check_blas.py", line 57, in execute
f = pytensor.function([], updates=[(c, 0.4 * c + 0.8 * dot(a, b))])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/compile/function/__init__.py", line 318, in function
fn = pfunc(
^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/compile/function/pfunc.py", line 465, in pfunc
return orig_function(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/compile/function/types.py", line 1757, in orig_function
fn = m.create(defaults)
^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/compile/function/types.py", line 1649, in create
_fn, _i, _o = self.linker.make_thunk(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/basic.py", line 245, in make_thunk
return self.make_all(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/vm.py", line 1236, in make_all
raise_with_op(fgraph, node)
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/utils.py", line 524, in raise_with_op
raise exc_value.with_traceback(exc_trace)
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/vm.py", line 1227, in make_all
node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1614, in cthunk_factory
key = self.cmodule_key()
^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1266, in cmodule_key
compile_args=self.compile_args(),
^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 947, in compile_args
ret += x.c_compile_args(c_compiler=c_compiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 496, in c_compile_args
return ldflags(libs=False, flags=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 359, in ldflags
return _ldflags(
^^^^^^^^^
File "/Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/pytensor/tensor/blas.py", line 430, in _ldflags
raise ValueError(f'invalid token "{t}" in ldflags_str: "{ldflags_str}"')
ValueError: invalid token "Accelerate" in ldflags_str: "-framework Accelerate -rpath /Users/daniel/.pyenv/versions/3.12.7/lib"
Apply node that caused the error: Gemm{inplace}(<Matrix(float64, shape=(?, ?))>, 0.8, <Matrix(float64, shape=(?, ?))>, <Matrix(float64, shape=(?, ?))>, 0.4)
Toposort index: 0
Inputs types: [TensorType(float64, shape=(None, None)), TensorType(float64, shape=()), TensorType(float64, shape=(None, None)), TensorType(float64, shape=(None, None)), TensorType(float64, shape=())]
HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
Results of
PYTENSOR_FLAGS='optimizer=None,exception_verbosity=high' python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 11.61s on ERROR, unable to tell if PyTensor used the cpu:
[dot(<Matrix(float64, shape=(?, ?))>, <Matrix(float64, shape=(?, ?))>), ExpandDims{axes=[0, 1]}(0.8), Mul(ExpandDims{axes=[0, 1]}.0, dot.0), ExpandDims{axes=[0, 1]}(0.4), Mul(ExpandDims{axes=[0, 1]}.0, <Matrix(float64, shape=(?, ?))>), Add(Mul.0, Mul.0)].
Awesome @danieltomasz! I can reproduce that problem locally now. The latest commit to the PR I had mentioned before should have fixed it. Let me know if it works for you. If it did, I'll try to setup a test on Mac ARM in our CI matrix so that this can be verified.
Thanks @lucianopaz, everything seems to work ok now with cpython and pip install!
Python 3.12.7 (main, Oct 31 2024, 00:49:16) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger("pytensor.link.c.cmodule")
>>> logger.setLevel(logging.DEBUG)
>>> import pytensor
DEBUG (pytensor.link.c.cmodule): Will search for BLAS libraries in the following directories:
/Library/Developer/CommandLineTools/usr/lib/clang/16
/Users/daniel/.pyenv/versions/3.12.7/lib
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with intel threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with GNU OpenMP threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking Accelerate framework
INFO (pytensor.link.c.cmodule): g++ -march=native selected lines: ['"/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple arm64-apple-macosx15.0.0 -Wundef-prefix=TARGET_OS_ -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -Werror=implicit-function-declaration -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -mframe-pointer=non-leaf -fno-strict-return -ffp-contract=on -fno-rounding-math -funwind-tables=1 -fobjc-msgsend-selector-stubs -target-sdk-version=15.1 -fvisibility-inlines-hidden-static-local-var -fno-modulemap-allow-subdirectory-search -target-cpu apple-m1 -target-feature +neon -target-feature +v8.5a -target-feature +zcm -target-feature +zcz -target-abi darwinpcs -debugger-tuning=lldb -target-linker-version 1115.7.3 -v -fcoverage-compilation-dir=/Users/daniel/blogspot-downloader -resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/16 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/usr/lib/clang/16/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -internal-externc-isystem /Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list -Wno-implicit-int-float-conversion -Wno-c99-designator -Wno-final-dtor-non-final-class -Wno-extra-semi-stmt -Wno-misleading-indentation -Wno-quoted-include-in-framework-header -Wno-implicit-fallthrough -Wno-enum-enum-conversion -Wno-enum-float-conversion -Wno-elaborated-enum-base -Wno-reserved-identifier -Wno-gnu-folding-constant -fdebug-compilation-dir=/Users/daniel/blogspot-downloader -ferror-limit 19 -stack-protector 1 -fstack-check -mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fmax-type-align=16 -fcommon -clang-vendor-feature=+disableNonDependentMemberExprInCurrentInstantiation -fno-odr-hash-protocols -clang-vendor-feature=+enableAggressiveVLAFolding -clang-vendor-feature=+revert09abecef7bbf -clang-vendor-feature=+thisNoAlignAttr -clang-vendor-feature=+thisNoNullAttr -clang-vendor-feature=+disableAtImportPrivateFrameworkInImplementationError -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ default lines: ['"/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple arm64-apple-macosx15.0.0 -Wundef-prefix=TARGET_OS_ -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -Werror=implicit-function-declaration -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -mframe-pointer=non-leaf -fno-strict-return -ffp-contract=on -fno-rounding-math -funwind-tables=1 -fobjc-msgsend-selector-stubs -target-sdk-version=15.1 -fvisibility-inlines-hidden-static-local-var -fno-modulemap-allow-subdirectory-search -target-cpu apple-m1 -target-feature +v8.5a -target-feature +aes -target-feature +crc -target-feature +dotprod -target-feature +fp-armv8 -target-feature +fp16fml -target-feature +lse -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sha2 -target-feature +sha3 -target-feature +neon -target-feature +zcm -target-feature +zcz -target-feature +fullfp16 -target-abi darwinpcs -debugger-tuning=lldb -target-linker-version 1115.7.3 -v -fcoverage-compilation-dir=/Users/daniel/blogspot-downloader -resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/16 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/usr/lib/clang/16/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -internal-externc-isystem /Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list -Wno-implicit-int-float-conversion -Wno-c99-designator -Wno-final-dtor-non-final-class -Wno-extra-semi-stmt -Wno-misleading-indentation -Wno-quoted-include-in-framework-header -Wno-implicit-fallthrough -Wno-enum-enum-conversion -Wno-enum-float-conversion -Wno-elaborated-enum-base -Wno-reserved-identifier -Wno-gnu-folding-constant -fdebug-compilation-dir=/Users/daniel/blogspot-downloader -ferror-limit 19 -stack-protector 1 -fstack-check -mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fmax-type-align=16 -fcommon -clang-vendor-feature=+disableNonDependentMemberExprInCurrentInstantiation -fno-odr-hash-protocols -clang-vendor-feature=+enableAggressiveVLAFolding -clang-vendor-feature=+revert09abecef7bbf -clang-vendor-feature=+thisNoAlignAttr -clang-vendor-feature=+thisNoNullAttr -clang-vendor-feature=+disableAtImportPrivateFrameworkInImplementationError -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ -march=native equivalent flags: ['-march=apple-m1']
and
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and computation)
GT 610 33.5s
Some PyTensor flags:
blas__ldflags= -framework Accelerate -rpath /Users/daniel/.pyenv/versions/3.12.7/lib
compiledir= /Users/daniel/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.7 (main, Oct 31 2024, 00:49:16) [Clang 16.0.0 (clang-1600.0.26.4)]
sys.prefix= /Users/daniel/.pyenv/versions/3.12.7
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
blas:
detection method: system
found: true
include directory: unknown
lib directory: unknown
name: accelerate
openblas configuration: unknown
pc file directory: unknown
version: unknown
lapack:
detection method: internal
found: true
include directory: unknown
lib directory: unknown
name: dep4405705904
openblas configuration: unknown
pc file directory: unknown
version: 1.26.4
Compilers:
c:
args: -I/opt/homebrew/opt/openblas/include
commands: gcc
linker: ld64
linker args: -L/opt/homebrew/opt/openblas/lib, -I/opt/homebrew/opt/openblas/include
name: clang
version: 16.0.0
c++:
commands: c++
linker: ld64
linker args: -L/opt/homebrew/opt/openblas/lib
name: clang
version: 16.0.0
cython:
commands: cython
linker: cython
name: cython
version: 3.0.11
Machine Information:
build:
cpu: aarch64
endian: little
family: aarch64
system: darwin
host:
cpu: aarch64
endian: little
family: aarch64
system: darwin
Python Information:
path: /Users/daniel/.pyenv/versions/3.12.7/bin/python3.12
version: '3.12'
SIMD Extensions:
baseline:
- NEON
- NEON_FP16
- NEON_VFPV4
- ASIMD
found:
- ASIMDHP
not found:
- ASIMDFHM
Numpy dot module: numpy
Numpy location: /Users/daniel/.pyenv/versions/3.12.7/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 16.22s on CPU (with direct PyTensor binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
@danieltomasz, this should now be fixed with #1056. If you want, you can try to run from the current pytensor main branch and check if it works. I had to do a bunch of extra changes to ensure compilation actually used blas symbols.
nice, after running
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
flags are different now
blas__ldflags= -framework Accelerate -Wl,-rpath,/Users/daniel/.pyenv/versions/3.12.7/lib
And the time of running is shorter (down to around 10-13s from 14-16s)
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 10.02s on CPU (with direct PyTensor binding to blas).
nice, after running
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
flags are different now
blas__ldflags= -framework Accelerate -Wl,-rpath,/Users/daniel/.pyenv/versions/3.12.7/lib
And the time of running is shorter (down to around 10-13s from 14-16s)
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000). Total execution time: 10.02s on CPU (with direct PyTensor binding to blas).
Yes, I changed the flags to make them aligned with what other blas flag specs that we use. And the execution time should be shorter because it’s actually linking to accelerate now. Before, it was failing to do so because of things that were handing downstream.
@danieltomasz, this should now be fixed with #1056. If you want, you can try to run from the current pytensor main branch and check if it works. I had to do a bunch of extra changes to ensure compilation actually used blas symbols.
This is great news! Is there a way to have my conda environment use this version of PyTensor? Or alternatively, when is the next release going to be such that this code is available to be installed normally via conda?
@Edderic just did: https://github.com/pymc-devs/pytensor/releases/tag/rel-2.26.0
However if you are using PyTensor for PyMC, that will also need a bump in the dependency due to major changes.
I've been following this thread as I recently got an M1 Mac, too, and I'm still not getting Accelerate to work with 2.26.0 :/
Here's the output with logging enabled in a fresh conda environment created with conda create -n pt -c coda-forge pytensor
Python 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 15:57:01) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger("pytensor.link.c.cmodule")
>>> logger.setLevel(logging.DEBUG)
>>> import pytensor
DEBUG (pytensor.link.c.cmodule): Will search for BLAS libraries in the following directories:
/Users/aurimas.racas/micromamba/envs/pt/lib/clang/18
/Users/aurimas.racas/micromamba/envs/pt/lib
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with intel threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with GNU OpenMP threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking Accelerate framework
INFO (pytensor.link.c.cmodule): g++ -march=native selected lines: ['"/Users/aurimas.racas/micromamba/envs/pt/bin/clang-18" -cc1 -triple arm64-apple-macosx11.0.0 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -funwind-tables=1 -target-sdk-version=15.1 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -fbuiltin-headers-in-system-modules -fdefine-target-os-macros -target-cpu apple-m1 -target-feature +zcm -target-feature +zcz -target-feature +v8.5a -target-feature +crc -target-feature +dotprod -target-feature +complxnum -target-feature +fp-armv8 -target-feature +jsconv -target-feature +lse -target-feature +pauth -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +neon -target-abi darwinpcs -debugger-tuning=lldb -fdebug-compilation-dir=/Users/aurimas.racas -target-linker-version 711 -v -fcoverage-compilation-dir=/Users/aurimas.racas -resource-dir /Users/aurimas.racas/micromamba/envs/pt/lib/clang/18 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Users/aurimas.racas/micromamba/envs/pt/lib/clang/18/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -ferror-limit 19 -stack-protector 1 -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fmax-type-align=16 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ default lines: ['"/Users/aurimas.racas/micromamba/envs/pt/bin/clang-18" -cc1 -triple arm64-apple-macosx11.0.0 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -funwind-tables=1 -target-sdk-version=15.1 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -fbuiltin-headers-in-system-modules -fdefine-target-os-macros -target-cpu apple-m1 -target-feature +zcm -target-feature +zcz -target-feature +v8.5a -target-feature +aes -target-feature +crc -target-feature +dotprod -target-feature +complxnum -target-feature +fp-armv8 -target-feature +fullfp16 -target-feature +fp16fml -target-feature +jsconv -target-feature +lse -target-feature +pauth -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sha2 -target-feature +sha3 -target-feature +neon -target-abi darwinpcs -debugger-tuning=lldb -fdebug-compilation-dir=/Users/aurimas.racas -target-linker-version 711 -v -fcoverage-compilation-dir=/Users/aurimas.racas -resource-dir /Users/aurimas.racas/micromamba/envs/pt/lib/clang/18 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Users/aurimas.racas/micromamba/envs/pt/lib/clang/18/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -ferror-limit 19 -stack-protector 1 -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fmax-type-align=16 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ -march=native equivalent flags: ['-march=apple-m1']
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-framework', 'Accelerate', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[16338]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aurimas.racas/micromamba/envs/pt/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Accelerate framework flag failed
DEBUG (pytensor.link.c.cmodule): Checking Lapack + blas
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-L/Users/aurimas.racas/micromamba/envs/pt/lib', '-llapack', '-lblas', '-lcblas', '-lm', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[16342]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aurimas.racas/micromamba/envs/pt/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aurimas.racas/micromamba/envs/pt/lib', '-llapack', '-lblas', '-lcblas', '-lm', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Checking blas alone
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-L/Users/aurimas.racas/micromamba/envs/pt/lib', '-lblas', '-lcblas', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[16346]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aurimas.racas/micromamba/envs/pt/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aurimas.racas/micromamba/envs/pt/lib', '-lblas', '-lcblas', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Checking openblas
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-L/Users/aurimas.racas/micromamba/envs/pt/lib', '-lopenblas', '-lgfortran', '-lgomp', '-lm', '-fopenmp', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[16352]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aurimas.racas/micromamba/envs/pt/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aurimas.racas/micromamba/envs/pt/lib', '-lopenblas', '-lgfortran', '-lgomp', '-lm', '-fopenmp', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Failed to identify blas ldflags. Will leave them empty.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
>>>
Replicating some of the other tests that @lucianopaz asked above:
Can you check what path to an executable you get as pytensor.config.cxx? Is it the system clang or is it the conda clang? It seems to be the conda one:
>> pytensor.config.cxx '/Users/aurimas.racas/micromamba/envs/pt/bin/clang++'
Can you try to run that cxx executable in a terminal as cxx -print-search-dirs? What directories do you get in the libraries entry? Is the conda env lib path included?
Yes it is.
> /Users/aurimas.racas/micromamba/envs/pt/bin/clang++ -print-search-dirs
programs: =/Users/aurimas.racas/micromamba/envs/pt/bin
libraries: =/Users/aurimas.racas/micromamba/envs/pt/lib/clang/18
Can you verify if there is any file that has the name blas in the conda env lib directory? If there is, what’s the file name extension?
➜ ~ ls /Users/aurimas.racas/micromamba/envs/pt/lib | grep blas
libblas.3.dylib
libblas.dylib
libcblas.3.dylib
libcblas.dylib
libopenblas.0.dylib
libopenblas.a
libopenblas.dylib
libopenblas_armv8p-r0.3.28.dylib
libopenblas_vortexp-r0.3.28.a
libopenblas_vortexp-r0.3.28.dylib
libopenblasp-r0.3.28.dylib
Can you try to run pytensor.link.c.cmodule.try_blas_flags(["-framework", "Accelerate"]) and see if you get something?
>>> pytensor.link.c.cmodule.try_blas_flag(["-framework", "Accelerate"])
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-framework', 'Accelerate']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[17129]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aurimas.racas/micromamba/envs/pt/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
If I try the same commands in an environment with pytensor=2.25.5
:
Python 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 15:57:01) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger("pytensor.link.c.cmodule")
>>> logger.setLevel(logging.DEBUG)
>>> import pytensor
DEBUG (pytensor.link.c.cmodule): Will search for BLAS libraries in the following directories:
/Users/aurimas.racas/micromamba/envs/pt225/lib/clang/17
/Users/aurimas.racas/micromamba/envs/pt225/lib
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with intel threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with GNU OpenMP threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking Lapack + blas
INFO (pytensor.link.c.cmodule): g++ -march=native selected lines: ['"/Users/aurimas.racas/micromamba/envs/pt225/bin/clang-17" -cc1 -triple arm64-apple-macosx11.0.0 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -funwind-tables=1 -target-sdk-version=15.1 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -target-cpu apple-m1 -target-feature +neon -target-feature +v8.5a -target-feature +zcm -target-feature +zcz -target-abi darwinpcs -debugger-tuning=lldb -target-linker-version 711 -v -fcoverage-compilation-dir=/Users/aurimas.racas -resource-dir /Users/aurimas.racas/micromamba/envs/pt225/lib/clang/17 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Users/aurimas.racas/micromamba/envs/pt225/lib/clang/17/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -fdebug-compilation-dir=/Users/aurimas.racas -ferror-limit 19 -stack-protector 1 -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fmax-type-align=16 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ default lines: ['"/Users/aurimas.racas/micromamba/envs/pt225/bin/clang-17" -cc1 -triple arm64-apple-macosx11.0.0 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -funwind-tables=1 -target-sdk-version=15.1 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -target-cpu apple-m1 -target-feature +v8.5a -target-feature +aes -target-feature +crc -target-feature +dotprod -target-feature +fp-armv8 -target-feature +fp16fml -target-feature +lse -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sha2 -target-feature +sha3 -target-feature +neon -target-feature +zcm -target-feature +zcz -target-feature +fullfp16 -target-abi darwinpcs -debugger-tuning=lldb -target-linker-version 711 -v -fcoverage-compilation-dir=/Users/aurimas.racas -resource-dir /Users/aurimas.racas/micromamba/envs/pt225/lib/clang/17 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Users/aurimas.racas/micromamba/envs/pt225/lib/clang/17/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -fdebug-compilation-dir=/Users/aurimas.racas -ferror-limit 19 -stack-protector 1 -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fmax-type-align=16 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ -march=native equivalent flags: ['-march=apple-m1']
DEBUG (pytensor.link.c.cmodule): Supplied flags failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aurimas.racas/micromamba/envs/pt225/lib', '-llapack', '-lblas', '-lcblas', '-lm', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt225/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Checking blas alone
DEBUG (pytensor.link.c.cmodule): Supplied flags failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aurimas.racas/micromamba/envs/pt225/lib', '-lblas', '-lcblas', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt225/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Checking openblas
DEBUG (pytensor.link.c.cmodule): Supplied flags failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aurimas.racas/micromamba/envs/pt225/lib', '-lopenblas', '-lgfortran', '-lgomp', '-lm', '-fopenmp', '-Wl,-rpath,/Users/aurimas.racas/micromamba/envs/pt225/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Failed to identify blas ldflags. Will leave them empty.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
>>>
>>> pytensor.config.cxx
'/Users/aurimas.racas/micromamba/envs/pt225/bin/clang++'
➜ ~ /Users/aurimas.racas/micromamba/envs/pt225/bin/clang++ -print-search-dirs
programs: =/Users/aurimas.racas/micromamba/envs/pt225/bin
libraries: =/Users/aurimas.racas/micromamba/envs/pt225/lib/clang/17
➜ ~ ls /Users/aurimas.racas/micromamba/envs/pt225/lib | grep blas
libblas.3.dylib
libblas.dylib
libcblas.3.dylib
libcblas.dylib
libopenblas.0.dylib
libopenblas.a
libopenblas.dylib
libopenblas_armv8p-r0.3.28.dylib
libopenblas_vortexp-r0.3.28.a
libopenblas_vortexp-r0.3.28.dylib
libopenblasp-r0.3.28.dylib
>>> pytensor.link.c.cmodule.try_blas_flag(["-framework", "Accelerate"])
''
From what I can see, pytensor=2.26.0
has clang18 installed in the environment, and pytensor=2.25.5
has clang17. Perhaps that's the issue?
This didn't work for me on a fresh Miniforge install on MacOS 15.1. My pytensor logging output is below.
I created my environment using:
mamba create -c conda-forge -c nodefaults -n pymc_macos15 pytensor
The pytensor import logs:
DEBUG (pytensor.link.c.cmodule): Will search for BLAS libraries in the following directories:
/Users/aaron/miniforge3/envs/pytensor_test/lib/clang/18
/Users/aaron/miniforge3/envs/pytensor_test/lib
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with intel threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking MKL flags with GNU OpenMP threading
DEBUG (pytensor.link.c.cmodule): Required file 'mkl_core' not found
DEBUG (pytensor.link.c.cmodule): Required file mkl_core not found
DEBUG (pytensor.link.c.cmodule): Checking Accelerate framework
INFO (pytensor.link.c.cmodule): g++ -march=native selected lines: ['"/Users/aaron/miniforge3/envs/pytensor_test/bin/clang-18" -cc1 -triple arm64-apple-macosx11.0.0 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -funwind-tables=1 -target-sdk-version=15.1 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -fbuiltin-headers-in-system-modules -fdefine-target-os-macros -target-cpu apple-m1 -target-feature +zcm -target-feature +zcz -target-feature +v8.5a -target-feature +crc -target-feature +dotprod -target-feature +complxnum -target-feature +fp-armv8 -target-feature +jsconv -target-feature +lse -target-feature +pauth -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +neon -target-abi darwinpcs -debugger-tuning=lldb -fdebug-compilation-dir=/Users/aaron -target-linker-version 711 -v -fcoverage-compilation-dir=/Users/aaron -resource-dir /Users/aaron/miniforge3/envs/pytensor_test/lib/clang/18 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Users/aaron/miniforge3/envs/pytensor_test/lib/clang/18/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -ferror-limit 19 -stack-protector 1 -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fmax-type-align=16 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ default lines: ['"/Users/aaron/miniforge3/envs/pytensor_test/bin/clang-18" -cc1 -triple arm64-apple-macosx11.0.0 -Wundef-prefix=TARGET_OS_ -Werror=undef-prefix -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -E -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -funwind-tables=1 -target-sdk-version=15.1 -fcompatibility-qualified-id-block-type-checking -fvisibility-inlines-hidden-static-local-var -fbuiltin-headers-in-system-modules -fdefine-target-os-macros -target-cpu apple-m1 -target-feature +zcm -target-feature +zcz -target-feature +v8.5a -target-feature +aes -target-feature +crc -target-feature +dotprod -target-feature +complxnum -target-feature +fp-armv8 -target-feature +fullfp16 -target-feature +fp16fml -target-feature +jsconv -target-feature +lse -target-feature +pauth -target-feature +ras -target-feature +rcpc -target-feature +rdm -target-feature +sha2 -target-feature +sha3 -target-feature +neon -target-abi darwinpcs -debugger-tuning=lldb -fdebug-compilation-dir=/Users/aaron -target-linker-version 711 -v -fcoverage-compilation-dir=/Users/aaron -resource-dir /Users/aaron/miniforge3/envs/pytensor_test/lib/clang/18 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Users/aaron/miniforge3/envs/pytensor_test/lib/clang/18/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -ferror-limit 19 -stack-protector 1 -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fmax-type-align=16 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o - -x c -']
INFO (pytensor.link.c.cmodule): g++ -march=native equivalent flags: ['-march=apple-m1']
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-framework', 'Accelerate', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[62467]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aaron/miniforge3/envs/pytensor_test/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Accelerate framework flag failed
DEBUG (pytensor.link.c.cmodule): Checking Lapack + blas
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-L/Users/aaron/miniforge3/envs/pytensor_test/lib', '-llapack', '-lblas', '-lcblas', '-lm', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[62470]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aaron/miniforge3/envs/pytensor_test/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aaron/miniforge3/envs/pytensor_test/lib', '-llapack', '-lblas', '-lcblas', '-lm', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Checking blas alone
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-L/Users/aaron/miniforge3/envs/pytensor_test/lib', '-lblas', '-lcblas', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[62473]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aaron/miniforge3/envs/pytensor_test/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aaron/miniforge3/envs/pytensor_test/lib', '-lblas', '-lcblas', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Checking openblas
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-L/Users/aaron/miniforge3/envs/pytensor_test/lib', '-lopenblas', '-lgfortran', '-lgomp', '-lm', '-fopenmp', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib']
failed with error message b"clang++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]\ndyld[62476]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv\n Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld\n Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aaron/miniforge3/envs/pytensor_test/lib/libtapi.dylib\nclang++: error: unable to execute command: Abort trap: 6\nclang++: error: linker command failed due to signal (use -v to see invocation)\n"
DEBUG (pytensor.link.c.cmodule): Supplied flags '' failed to compile
DEBUG (pytensor.link.c.cmodule): Supplied flags ['-L/Users/aaron/miniforge3/envs/pytensor_test/lib', '-lopenblas', '-lgfortran', '-lgomp', '-lm', '-fopenmp', '-Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib'] failed to compile
DEBUG (pytensor.link.c.cmodule): Failed to identify blas ldflags. Will leave them empty.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Running
python $(python -c "import pathlib, pytensor; print(pathlib.Path(pytensor.__file__).parent / 'misc/check_blas.py')")
outputs:
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
...
Some PyTensor flags:
blas__ldflags=
compiledir= /Users/aaron/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 15:57:01) [Clang 17.0.6 ]
sys.prefix= /Users/aaron/miniforge3/envs/pymc_macos15
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /Users/aaron/miniforge3/envs/pymc_macos15/include
lib directory: /Users/aaron/miniforge3/envs/pymc_macos15/lib
name: blas
openblas configuration: unknown
pc file directory: /Users/aaron/miniforge3/envs/pymc_macos15/lib/pkgconfig
version: 3.9.0
lapack:
detection method: internal
found: true
include directory: unknown
lib directory: unknown
name: dep4569863840
openblas configuration: unknown
pc file directory: unknown
version: 1.26.4
Compilers:
c:
args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -isystem,
/Users/aaron/miniforge3/envs/pymc_macos15/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/aaron/miniforge3/envs/pymc_macos15=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include,
-mmacosx-version-min=11.0
commands: arm64-apple-darwin20.0.0-clang
linker: ld64
linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/aaron/miniforge3/envs/pymc_macos15/lib,
-L/Users/aaron/miniforge3/envs/pymc_macos15/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
-O2, -pipe, -isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/aaron/miniforge3/envs/pymc_macos15=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include,
-mmacosx-version-min=11.0
name: clang
version: 16.0.6
c++:
args: -ftree-vectorize, -fPIC, -fstack-protector-strong, -O2, -pipe, -stdlib=libc++,
-fvisibility-inlines-hidden, -fmessage-length=0, -isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include,
-fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/aaron/miniforge3/envs/pymc_macos15=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include,
-mmacosx-version-min=11.0
commands: arm64-apple-darwin20.0.0-clang++
linker: ld64
linker args: -Wl,-headerpad_max_install_names, -Wl,-dead_strip_dylibs, -Wl,-rpath,/Users/aaron/miniforge3/envs/pymc_macos15/lib,
-L/Users/aaron/miniforge3/envs/pymc_macos15/lib, -ftree-vectorize, -fPIC, -fstack-protector-strong,
-O2, -pipe, -stdlib=libc++, -fvisibility-inlines-hidden, -fmessage-length=0,
-isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include, -fdebug-prefix-map=/Users/runner/miniforge3/conda-bld/numpy_1707225421156/work=/usr/local/src/conda/numpy-1.26.4,
-fdebug-prefix-map=/Users/aaron/miniforge3/envs/pymc_macos15=/usr/local/src/conda-prefix,
-D_FORTIFY_SOURCE=2, -isystem, /Users/aaron/miniforge3/envs/pymc_macos15/include,
-mmacosx-version-min=11.0
name: clang
version: 16.0.6
cython:
commands: cython
linker: cython
name: cython
version: 3.0.8
Machine Information:
build:
cpu: aarch64
endian: little
family: aarch64
system: darwin
cross-compiled: true
host:
cpu: arm64
endian: little
family: aarch64
system: darwin
Python Information:
path: /Users/aaron/miniforge3/envs/pymc_macos15/bin/python
version: '3.12'
SIMD Extensions:
baseline:
- NEON
- NEON_FP16
- NEON_VFPV4
- ASIMD
found:
- ASIMDHP
not found:
- ASIMDFHM
Numpy dot module: numpy
Numpy location: /Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
You can find the C code in this temporary file: /var/folders/b8/7kxz8b7579n0gp9kb3wmcf880000gn/T/pytensor_compilation_error_dsrc8_jd
ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: ExpandDims{axes=[0, 1]}(0.8)
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1909, in process_node
replacements = node_rewriter.transform(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
return self.fn(fgraph, node)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/tensor/rewriting/basic.py", line 1117, in constant_folding
thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
outputs = cl.make_thunk(
^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1182, in make_thunk
cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1103, in __compile__
thunk, module = self.cthunk_factory(
^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1627, in cthunk_factory
module = cache.module_from_key(key=key, lnk=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 1255, in module_from_key
module = lnk.compile_cmodule(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/basic.py", line 1528, in compile_cmodule
module = c_compiler.compile_str(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/cmodule.py", line 2677, in compile_str
raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/Users/aaron/miniforge3/envs/pymc_macos15/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -fPIC -undefined dynamic_lookup -I/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/numpy/core/include -I/Users/aaron/miniforge3/envs/pymc_macos15/include/python3.12 -I/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/pytensor/link/c/c_code -L/Users/aaron/miniforge3/envs/pymc_macos15/lib -fvisibility=hidden -o /Users/aaron/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64/tmpkj3rjfi4/mb782a9925f26f74c46a75d98e1484e89ff6c5c482e4b63d738d2bb93e667f8f6.so /Users/aaron/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64/tmpkj3rjfi4/mod.cpp
dyld[54048]: Symbol not found: __ZNK4tapi2v119LinkerInterfaceFile28getPlatformsAndMinDeploymentEv
Referenced from: <16BFD524-5ED0-3DE1-B23F-84EE5092744F> /Library/Developer/CommandLineTools/usr/bin/ld
Expected in: <15C501C6-0EF4-3E32-9C14-04EC4CD23D35> /Users/aaron/miniforge3/envs/pymc_macos15/lib/libtapi`.`dylib
clang++: error: unable to execute command: Abort trap: 6
clang++: error: linker command failed due to signal (use -v to see invocation)
I then ran:
mamba install "libblas=*=*accelerate"
Same result when trying the above. I tried a few combinations and orders of this, including installing pytensor-base first.
Finally I gave up and created a fresh environment with only python=3.12, then used pip to install pytensor. Now the check_blas.py output is:
Some PyTensor flags:
blas__ldflags= -framework Accelerate -Wl,-rpath,/Users/aaron/miniforge3/envs/pymc_macos15/lib
compiledir= /Users/aaron/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 15:57:01) [Clang 17.0.6 ]
sys.prefix= /Users/aaron/miniforge3/envs/pymc_macos15
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "cc",
"args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.8",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "c++",
"args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
}
},
"Machine Information": {
"host": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
}
},
"Build Dependencies": {
"blas": {
"name": "openblas64",
"found": true,
"version": "0.3.23.dev",
"detection method": "pkgconfig",
"include directory": "/opt/arm64-builds/include",
"lib directory": "/opt/arm64-builds/lib",
"openblas configuration": "USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3",
"pc file directory": "/usr/local/lib/pkgconfig"
},
"lapack": {
"name": "dep4335021056",
"found": true,
"version": "1.26.4",
"detection method": "internal",
"include directory": "unknown",
"lib directory": "unknown",
"openblas configuration": "unknown",
"pc file directory": "unknown"
}
},
"Python Information": {
"path": "/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-q69bfk1p/cp312-macosx_arm64/build/venv/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
Numpy dot module: numpy
Numpy location: /Users/aaron/miniforge3/envs/pymc_macos15/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 3.68s on CPU (with direct PyTensor binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
Thanks @aurimas-ww and @areding. I also started to run into a similar problem in a different environment. I think that I know the cause, it’s related to the linker that Mac deployed with Xcode 15.
Both of your logs say that a symbol wasn’t found in the tapi library, but the next line says that the linker actually died trying to go through that dylib with a signal. On my machine I managed to see a segfault with signal 11. Googling around, I found this thread that might have the solution we need. The explanatory post says that Xcode 15 brought on a new linker. This linker seems to not like the layout of some dylib and dies. We can luckily ask Xcode’s linker to behave like the good old linker by supplying some flags to the compiler. I haven’t gotten around to implementing this in pytensor yet, but maybe you could try to add a configuration flag for the extra compile flags that has -ld64
or -Wl,-ld64
(I’m not sure if it’s a linker flag or a compiler flag yet). Maybe next week, I’ll be able to sit down and test this out properly.
Not sure if that's the right way to do it, but adding these flags (either just ld64
or both) to try_blas_flag
gives a different error message:
>>> pytensor.link.c.cmodule.try_blas_flag(["-Wl", "-ld64", "-framework", "Accelerate"])
DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-Wl', '-ld64', '-framework', 'Accelerate']
failed with error message b"dyld[23414]: Library not loaded: @rpath/libc++.1.dylib\n Referenced from: <DE0B8C7D-A117-3BDE-8DE9-252F4D6C9054> /private/var/folders/hs/rg3ptf7571n92r08b54rz7g40000gq/T/try_blas_y1v18bkk\n Reason: no LC_RPATH's found\n"
Not sure if that's the right way to do it, but adding these flags (either just
ld64
or both) totry_blas_flag
gives a different error message:>>> pytensor.link.c.cmodule.try_blas_flag(["-Wl", "-ld64", "-framework", "Accelerate"]) DEBUG (pytensor.link.c.cmodule): try_blas_flags of flags: ['-Wl', '-ld64', '-framework', 'Accelerate'] failed with error message b"dyld[23414]: Library not loaded: @rpath/libc++.1.dylib\n Referenced from: <DE0B8C7D-A117-3BDE-8DE9-252F4D6C9054> /private/var/folders/hs/rg3ptf7571n92r08b54rz7g40000gq/T/try_blas_y1v18bkk\n Reason: no LC_RPATH's found\n"
That’s because you left out the rpath from the compilation flags. You don’t need to drop them, just add a new flag
@aurimas-ww and @areding, we just merged #1083. Could you please install pytensor from the current state of the main branch and check if your issues go away? We can't be sure if this was fixed with the CI runs only because we haven't found a snippet that fails systematically, but the people that we've asked so far told us that their setup is working now.
That did it. Thank you!
Some PyTensor flags:
blas__ldflags= -framework Accelerate -Wl,-rpath,/Users/aaron/miniforge3/envs/pytensor_test/lib
compiledir= /Users/aaron/.pytensor/compiledir_macOS-15.1-arm64-arm-64bit-arm-3.12.7-64
floatX= float64
device= cpu
Some OS information:
sys.platform= darwin
sys.version= 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 15:57:01) [Clang 17.0.6 ]
sys.prefix= /Users/aaron/miniforge3/envs/pytensor_test
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
/Users/aaron/miniforge3/envs/pytensor_test/lib/python3.12/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
"Compilers": {
"c": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "cc",
"args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-fno-strict-aliasing, -DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
},
"cython": {
"name": "cython",
"linker": "cython",
"version": "3.0.8",
"commands": "cython"
},
"c++": {
"name": "clang",
"linker": "ld64",
"version": "14.0.0",
"commands": "c++",
"args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64",
"linker args": "-DBLAS_SYMBOL_SUFFIX=64_, -DHAVE_BLAS_ILP64"
}
},
"Machine Information": {
"host": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
},
"build": {
"cpu": "aarch64",
"family": "aarch64",
"endian": "little",
"system": "darwin"
}
},
"Build Dependencies": {
"blas": {
"name": "openblas64",
"found": true,
"version": "0.3.23.dev",
"detection method": "pkgconfig",
"include directory": "/opt/arm64-builds/include",
"lib directory": "/opt/arm64-builds/lib",
"openblas configuration": "USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3",
"pc file directory": "/usr/local/lib/pkgconfig"
},
"lapack": {
"name": "dep4335021056",
"found": true,
"version": "1.26.4",
"detection method": "internal",
"include directory": "unknown",
"lib directory": "unknown",
"openblas configuration": "unknown",
"pc file directory": "unknown"
}
},
"Python Information": {
"path": "/private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-q69bfk1p/cp312-macosx_arm64/build/venv/bin/python",
"version": "3.12"
},
"SIMD Extensions": {
"baseline": [
"NEON",
"NEON_FP16",
"NEON_VFPV4",
"ASIMD"
],
"found": [
"ASIMDHP"
],
"not found": [
"ASIMDFHM"
]
}
}
Numpy dot module: numpy
Numpy location: /Users/aaron/miniforge3/envs/pytensor_test/lib/python3.12/site-packages/numpy/__init__.py
Numpy version: 1.26.4
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 3.65s on CPU (with direct PyTensor binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
Same here - I can confirm it works on my machine, too. Thanks!!
Hey, can you guys post the steps you used to get a successful installation? I'm able to replicate each of the steps outlined by @aurimas-ww, but when pip installing pytensor
main branch from github, I still get the same results 😞.
@scroobiustrip - I just tried it in a fresh virtual environment, simple (uv) pip install - and it worked:
mkdir foo
cd foo
uv venv
source .venv/bin/activate
uv pip install https://github.com/pymc-devs/pytensor.git
@ricardoV94, we should make a release with the patch. Just a bug fix release. That way everyone can just use conda or pip as usual
@ricardoV94, we should make a release with the patch. Just a bug fix release. That way everyone can just use conda or pip as usual
@lucianopaz already did and also the latest PyMC is linked to it
@ricardoV94, we should make a release with the patch. Just a bug fix release. That way everyone can just use conda or pip as usual
@lucianopaz already did and also the latest PyMC is linked to it
I don't see #1083 commits included in the latest release. Those were the ones that made the last errors go away.
@lucianopaz my bad. I'll release now. Since it's not a major release it should become automatically compatible with PyMC
@lucianopaz can you edit the PR title to be more informative than "LD64"?
@lucianopaz can you edit the PR title to be more informative than "LD64"?
Done
Patch is in https://github.com/pymc-devs/pytensor/releases/tag/rel-2.26.3
Should be available in the common channels soon
Describe the issue:
Since update to MacOS 15 I have a problem with using Apple implementation of BLAS. Installing
pytensor
fromminiconda3-3.12-24.7.1-0
viaconda create -n voxel-bayes-3.12 -c conda-forge pytensor
seems to installopenblas
instead of accelerate.Running this the check
And when I try to run the same command but in env with pip installed pytensor results in this
When I try to specify the accelerate the old way via "libblas==accelerate" when installing the conda environment, when I try to run this it fails , I copied the output here https://discourse.pymc.io/t/pytensor-support-to-apple-accelerate-blas-with-conda-forge-on-macos-15/15131/2
Reproducable code example:
Error message:
No response
PyTensor version information:
conda-forge/osx-arm64::pytensor-2.25.4-py312h3f593ad_0
Context for the issue:
No response