google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
473 stars 49 forks source link

`to_tensorflow_function()` fails if added to the quickstart #87

Closed LukeWood closed 4 months ago

LukeWood commented 5 months ago

If you add:

!pip install ydf
!pip install tensorflow_decision_forests -q

to the intro colab, and try to call:

fn = model.to_tensorflow_function(temp_dir='./tmp')

You get the error:

TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "yggdrasil_decision_forests/dataset/data_spec.proto":
  yggdrasil_decision_forests/dataset/data_spec.proto: A file with this name is already in the pool.
rlcauvin commented 5 months ago

I get the same error when using model.to_tensorflow_saved_model.

rlcauvin commented 5 months ago

I got both model.to_tensorflow_saved_model and model.to_tensorflow_function to work with the following package versions installed:

keras==3.3.2 tensorflow==2.16.1 tensorflow-datasets==4.9.4 tensorflow-estimator==2.15.0 tensorflow-io-gcs-filesystem==0.36.0 tensorflow-metadata==1.15.0 tensorflow-ranking==0.5.5 tensorflow-recommenders==0.7.3 tensorflow-serving-api==2.15.1 tensorflow_decision_forests==1.9.0 tf_keras==2.16.0

LukeWood commented 5 months ago

So was the issue a missing package, or incorrect version? Perhaps a runtime check would help?

On Tue, Apr 23, 2024 at 1:04 PM Roger L. Cauvin @.***> wrote:

I got both model.to_tensorflow_saved_model and model.to_tensorflow_function to work with the following package versions installed:

keras==3.3.2 tensorflow==2.16.1 tensorflow-datasets==4.9.4 tensorflow-estimator==2.15.0 tensorflow-io-gcs-filesystem==0.36.0 tensorflow-metadata==1.15.0 tensorflow-ranking==0.5.5 tensorflow-recommenders==0.7.3 tensorflow-serving-api==2.15.1 tensorflow_decision_forests==1.9.0 tf_keras==2.16.0

— Reply to this email directly, view it on GitHub https://github.com/google/yggdrasil-decision-forests/issues/87#issuecomment-2072945279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5AMR6ZXAUDBAZJEJ5AINLY62IBFAVCNFSM6AAAAABGPXTILCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZSHE2DKMRXHE . You are receiving this because you authored the thread.Message ID: @.***>

achoum commented 5 months ago

Thanks @LukeWood for the alert, and then @rlcauvin for the debugging 😃.

TL;DR: The situation has been solved with ydf 0.4.2, and will be further improved with the next release of TF-DF (1.10.0).

Details:

tensorflow_decision_forests and ydf share a similar dependency to the ydf protobufs. The latest release of ydf introduces a new version of the ydf protobufs, which cause a collision with tf-df: Installing tf-df and ydf in a specific order causes ydf to fail.

The error reported by @LukeWood was raised with ydf 0.4.1. This error has the same root as the one reported by @rlcauvin for ydf 0.4.0, though the message was different. ydf 0.4.2 solves the issue and displays a proper and actionable error message in case of shared dependency collision. The next version of tf-df (1.10.0) remove this potential dependency collision.

achoum commented 4 months ago

ydf 0.4.3 and tf-df 1.9.1 are now available on PyPI. Those releases address the dependency collision issue as tf-df now depends on ydf.

rlcauvin commented 4 months ago

Having installed those packages using:

!pip install -U ydf
!pip install -U tensorflow_decision_forests

I now am unable to import the ydf package:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[13], line 10
      8 import tensorflow_recommenders as tfrs
      9 # import tensorflow_ranking as tfr
---> 10 import ydf
     11 import matplotlib.pyplot as plt
     12 import matplotlib as mpl

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/__init__.py:29
     21   if not hasattr(data_spec_pb2, "DType"):
     22     raise ValueError("""\
     23 Collision between YDF and TensorFlow Decision Forests protobuf shared dependencies.
     24 Please, reinstall YDF with the "--force" argument, restart the notebook runtime (if using a notebook), and try again:
     25 
     26 !pip install ydf --force""")
---> 29 _check_install()
     32 # pylint: disable=g-importing-member,g-import-not-at-top,g-bad-import-order,reimported
     33 
     34 # Version
     35 from ydf.version import version as __version__

File ~/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages/ydf/__init__.py:19, in _check_install()
     18 def _check_install():
---> 19   from yggdrasil_decision_forests.dataset import data_spec_pb2
     21   if not hasattr(data_spec_pb2, "DType"):
     22     raise ValueError("""\
     23 Collision between YDF and TensorFlow Decision Forests protobuf shared dependencies.
     24 Please, reinstall YDF with the "--force" argument, restart the notebook runtime (if using a notebook), and try again:
     25 
     26 !pip install ydf --force""")

ImportError: cannot import name 'data_spec_pb2' from 'yggdrasil_decision_forests.dataset' (unknown location)

Relevant installed package versions:

keras==3.3.3
tensorflow==2.16.1
tensorflow-datasets==4.9.4
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.36.0
tensorflow-metadata==1.15.0
tensorflow-ranking==0.5.5
tensorflow-recommenders==0.7.3
tensorflow-serving-api==2.15.1
tensorflow_decision_forests==1.9.1
tf_keras==2.16.0
ydf==0.4.3
achoum commented 4 months ago

This is annoying :). Do you mind trying a full fresh install?

pip uninstall ydf
pip uninstall tensorflow_decision_forests
pip install ydf --force
pip install tensorflow_decision_forests --force
rlcauvin commented 4 months ago

After following @achoum's instructions for a full fresh install, I am able to import the ydf package, and to_tensorflow_function() and to_tensorflow_saved_model() seem to be working. I am still having a separate issue with tree_plot.html() but I've reported that issue in another ticket.