google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
447 stars 49 forks source link

On MacOSX, Mac M Hardware (ARM), a segmentation fault happened with YDF when pyarrow is installed #79

Open lusis-ai opened 3 months ago

lusis-ai commented 3 months ago

Setup : MacOSX 13 or 14, Mac M hardware

Prerequisite : Install miniforge3

% conda create --name ydfpandasissue
% conda activate ydfpandasissue
% conda install python=3.10
% conda install pandas
% pip install ydf-0.2.0-cp310-cp310-macosx_13_0_arm64.whl

When running this program (ydf_test.py), it works.

import ydf
import pandas as pd
import numpy as np

dataset = {
    "x1": np.array([0, 0, 0, 1, 1, 1]),
    "x2": np.array([1, 1, 0, 0, 1, 1]),
    "y": np.array([0, 0, 0, 0, 1, 1]),
}

model = ydf.CartLearner(label="y", min_examples=1, task=ydf.Task.CLASSIFICATION).train(dataset)
print(model.describe())

Now install pyarrow from conda or pip the result is the same: it fails Only the error message is different.

% conda install pyarrow
% python ydf_test.py
zsh: segmentation fault  python ydf_test.py
% conda uninstall pyarrow
% pip install pyarrow
% python ydf_test.py
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
zsh: abort      python ydf_test.py

Note that pyarrow is mandatory when we work on big tabular dataset stored in parquet files.

rstz commented 3 months ago

Thank you for the detailed report, I will have a look

lusis-ai commented 3 months ago

Similar issue happened with tensorflow_decision_forests.

After installing tensorflow and tensorflow_decision_forests from pip (as tfdf for ARM on conda is not available), in the same config as above, the following error happened (here python terminal):

Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:35:25) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_decision_forests as tfdf
>>> import ydf
[mutex.cc : 453] RAW: Lock blocking 0x600001892898   @
mowoe commented 3 months ago

I had the same issue, but i failed to make the connection to ydf. As a temporary workaround, i switched to fastparquet, which is the other library pandas supports to read parquet files. This one works fine for me.

lusis-ai commented 3 months ago

But the issue is still there when importing tensorflow or tensorflow_decision_forests.

We have utils libs importing tensorflow so it makes it crash with :

libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
zsh: abort      python ydf_test.py
mowoe commented 3 months ago
python -c 'import pandas;import tensorflow;import tensorflow_decision_forests'

works fine in my venv which only has fastparquet installed and not pyarrow

rstz commented 3 months ago

To give some preliminary findings from the crash logs:

lusis-ai commented 3 months ago

Nice, we manage package consistency with conda but inside a conda env we can also install packages with pip when needed. Tomorrow I will try by using pip only to check.

Thanks for your help

lusis-ai commented 3 months ago

Hi,

For the issue with pyarrow, thanks to your indication it's resolved just by forcing protobuf to 4.24.3, even installing protobuf with conda is ok and now it works.

The strange thing is that, even if ydf has protobuf24.3 statically linked, pip install the very last 4.25.3 version. There is no strict requirement to force the protobuf version to 4.24.3 when installing ydf from pip, just protobuf>=3.14, maybe it should be modified ?

Anyway, by doing it manually, it works now.

Not the same issue for TF-DF, it still crash, so I cannot use model.to_tensorflow_saved_model(path) function.