tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
663 stars 110 forks source link

Plot very large decision tree #140

Open Arnold1 opened 2 years ago

Arnold1 commented 2 years ago

Hi,

I have a decision tree with 20k nodes. How can I plot it?

I checked the d3.js code but with svg its pretty slow to render 20k nodes and use some zoom with it.

is there a way to generate a graphviz too and convert it to a huge png so I can view it with https://leafletjs.com/? or is there a way to draw the decision tree with d3 and canvas instead of svg?

achoum commented 2 years ago

Hi,

There is currently no integrated display to graphviz. However, this should be easy to put in place. The model inspector gives you access to the tree structures. The inspector (and related data structures) is used by the tree plotter and the tree printer (printing the tree as text). What about calling manually the model inspector and populating a graphviz accordingly. For example:

inspector = model.make_inspector()
for tree in inspector.extract_all_trees()
  add_tree_to_graphviz_plot(tree)

If you get something polished, don't hesitate to add it to TF-DF contribs.

Arnold1 commented 2 years ago

is it possible to display 20k nodes with graphviz and add some zooming functionality as well in a html environment?

rstz commented 2 years ago

The great Dtreeviz decision tree plotting library very recently got support for TF-DF. They have an iPython Notebook demonstrating how to use it. Let us know if this works for you.

Pinging @tlapusan who has been working on this.

Arnold1 commented 2 years ago

hi @tlapusan does Dtreeviz also work for tfdf.keras.RandomForestModel(task=tfdf.keras.Task.REGRESSION, with 25k nodes?

tlapusan commented 2 years ago

hi @Arnold1, definitely it will be a challenge :) I assume that for your big tree you have also a big training set. One possible solution would be to use the parameter 'depth_range_to_display' and choose what tree levels you want to display, ex depth_range_to_display = (0, 10)

I'm just curious what insights would you like to get from such a big tree ? IMO is not very effective to look at a tree structure with so many nodes.

achoum commented 2 years ago

@tlapusan has a good point. I would be interesting to know more about the use case.

In the meantime, you could try some generic graph visualization softwares (e.g., Gephi). Looking at the raw trees might have limited interest though (Individual Random Forests trees are overfitted, and GBT trees cannot be understood individually). It is likely more interesting to look at some projections of those trees (some basic examples: feature interactions, proximity plots, cross-trees agreements, etc.).

Arnold1 commented 1 year ago

hi team - i see an update in here: https://github.com/google/yggdrasil-decision-forests/releases/tag/1.3.0

Improve the display of decision tree structures.

how can i utilize that from the python side using this repo?

rstz commented 1 year ago

Hi, this refers to a change of the display of decision trees in ASCII, see here, so it is probably not relevant to you.

Arnold1 commented 1 year ago

hi, is there currently a way to generate a Layered violin plot with tensorflow decision forests? example: https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Scatter%20Density%20vs.%20Violin%20Plot%20Comparison.html#Layered-violin-plot