tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
663 stars 110 forks source link

tfdf 1.9.0 only compatible with tf 2.16.1 which ships Keras 3 #214

Closed mowoe closed 8 months ago

mowoe commented 8 months ago

Hi,

i need the fix for the issue described in https://github.com/google/yggdrasil-decision-forests/issues/78 which has been released with tf-df 1.9.0. However, the compatibility table from known_issues.md states that this version is only compatible with tensorflow 2.16.1. But this tensorflow version ships Keras 3 which is incompatible with tfdf. Force installing an older version of tensorflow to get the legacy keras api will produce this error:

RuntimeError: Op type not registered 'SimpleMLLoadModelFromPathWithHandle' in binary running on hostname. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib (e.g. `tf.contrib.resampler`), accessing should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

I suspect this is because the incompatibility of the abi between tf versions, but i dont see a way out of here. I need the newer tfdf version, as pydf didnt get a newer release yet.

Thanks for your help!

rstz commented 8 months ago

Hi,

Unfortunately it's not possible to deploy a TF-DF version that's compatible with multiple TensorFlow versions since TensorFlow's ABI constantly changes.

We are very, very close to releasing a new PYDF version which includes the fix you need. No promises (unexpected things can always happen), but I'd suggest you wait a couple of days and check back then.

If you really can't wait, I'd suggest you give compiling PYDF a try, there are some instructions on how to do this on Github.

mowoe commented 8 months ago

Alright, thanks for the info. I actually tried to build pydf myself, but it always fails at the stage where the manylinux wheel is repaired because my toolchain is too new. And the proper image for building manylinux2014 wheels is CentOS 7 based which doesnt have a recent gcc-9 available (the one provided has a strange bug), so im unable to build the manylinux wheel. Maybe i will just try to build a non-manylinux wheel, but then id have to dig a bit deeper in the provided script.

rstz commented 8 months ago

Ok, I also plan to update the install instructions (and scripts) with the 0.3.0 release, so this will also get easier hopefully :)

mowoe commented 8 months ago

Thanks! Then i will just wait a bit i think. But the issue i mentioned in the first comment still remains right? Currently, there is no way to load any model when using the most recent tfdf version right? Or am i missing something here?

rstz commented 8 months ago

Sorry, I overlooked this question.

No, you can still use old models with TF-DF, but you need to instruct tensorflow / keras to use Keras 2:

  1. Make sure tf_keras is installed (it's a dependency of TF-DF, so this should already be the case).
  2. At the top of your program, set
    import os
    # Keep using Keras 2
    os.environ['TF_USE_LEGACY_KERAS'] = '1'
    import tf_keras

    and replace tf.keras with tf_keras in your pipeline (this might not even be necessary after setting the environment variable)

If there's any issues after this step, we would consider it a bug. Our tutorials https://www.tensorflow.org/decision_forests/tutorials/beginner_colab have been adapted to these changes and seem to work fine

mowoe commented 8 months ago

Okay, then i think it is a bug. Doing

pip install tensorflow-decision-forests==1.9.0 tf_keras

in a fresh environment and then running

import os
# Keep using Keras 2
os.environ['TF_USE_LEGACY_KERAS'] = '1'
import tf_keras
import sys

model = tf_keras.models.load_model(sys.argv[1])

in it yields

RuntimeError: Op type not registered 'SimpleMLLoadModelFromPathWithHandle' in binary running on de2lxl-520977. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib (e.g. `tf.contrib.resampler`), accessing should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

As described in the first comment.

rstz commented 8 months ago

You also need to import tensorflow decision forests (sorry for not being clear), i.e. the full code is

import os
# Keep using Keras 2
os.environ['TF_USE_LEGACY_KERAS'] = '1'
import tensorflow_decision_forests
import tf_keras
import sys

model = tf_keras.models.load_model(sys.argv[1])

(Technical background: TF-DF defines custom ops in Tensorflow for saving / loading / training Decision Tree models. These ops can only be used by tensorflow after TF-DF has been imported)

mowoe commented 8 months ago

Okay i should have figured this out myself 🤦 I was somehow expecting that the custom ops get registered "system-wide" as soon as you install tfdf. Thank you so much!