tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
663 stars 110 forks source link

ANE support through coremltools #202

Closed conradkun closed 11 months ago

conradkun commented 11 months ago

Hi,

I have been playing around with this library on my Mac and had some questions regarding performance:

  1. Is it possible to convert a tfdf model to a coreML model (Apple's format) so that it can run on the Apple Neural Engine? Apple provides coremltools to convert tf models seamlessly, but I haven't been able to make it work for tfdf's models (I don't know how to deal with one of the inputs, usually labelled something like random_forest_model_1/16566, but changing across runs).
  2. Are there any plans to add support for this, if it is currently unsupported?
  3. What kind of speedup would you expect the ANE to provide? Keep in mind that it only works for inference, so only that should be considered.

Thank you!

janpfeifer commented 11 months ago

hi @conradkun , let me try to answer it:

1 and 3. We don't have such a conversion tool, but likely it could be doable. Now ... I wouldn't expect necessarily much gain, since decision forests (DF) inference (and learning) generally don't require any matrix multiplication, and the parallelization is trivial. Still, often accelerators have faster memory bandwidth, so maybe ... but not a gain like what one gets with NN.

Have you tried Yggdrasil Decision Forests, or YDF, for inference ? It is the underlying pure C++ implementation (TF-DF uses it) and inference using it are already superfast: in some of our use cases, they run on 100s of nanoseconds per inference on a normal intel cpu (on a model with ~70 trees), likely faster on a Mac M1/M2. Notice TF-DF models will work out-of-the-box in YDF, no conversion needed.

  1. No plans right now ... I'm sorry. But the format of TF-DF (and YDF) is public, one could create such a converter as a separate project -- or as a contribution to YDF (our underlying C++ library). Notice there are some subtleties that may make it non-trivial -- if you want a fast inference. If you are interested and is serious about it, we are happy to help, let's scheduled a chat.

Let me know if that helps, or if you have any other questions.

cheers

rstz commented 11 months ago

I'd be interested to learn what performance you ultimately want to achieve for (roughly) which types of model. Feel free to post here or drop an email to the team.

conradkun commented 11 months ago

Hi, and thank you both for your helpful and quick replies.

I'm not sure about how much I am allowed to publicly say, since this is industry related, but I may drop you an email if we decide it's worth pursuing; unfortunately, I assume the ANE is optimized mostly for NNs (although they do have support for tree ensembles from XGBoost and sci-kit), so there is a considerable chance it won't be much.

On a personal note, I may be interested in volunteering to implement the conversion; seems like a fun exercise. I will try to familiarize myself better with both ydf and coremltools, and I'll contact you if I think I have the time and understanding to do it.

rstz commented 11 months ago

Hi, that's great, we'd be very happy to help you and give pointers!

Some initial pointers to start with:

Closing this but feel free to reopen if there's additional questions.