tensorflow / decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Apache License 2.0
663 stars 110 forks source link

builder.close() failing when using GradientBoostedTreeBuilder #45

Closed bannisterhayley closed 3 years ago

bannisterhayley commented 3 years ago

I've been following the example posted here to obtain predictions from individual trees within a GradientBoostedTreesModel i.e.

# Train model
model = tfdf.keras.GradientBoostedTreesModel()
model.compile(metrics=["accuracy"])
model.fit(train_ds)

# Extract trees
trees = model.make_inspector().extract_all_trees()

# Build model with one tree
builder =  tfdf.builder.GradientBoostedTreeBuilder(
    path = "model",
    objective=inspector_bt.objective()
)
builder.add_tree(trees[0])
builder.close()

However, it fails when calling builder.close() with the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-f4a8f4f498e3> in <module>
      7 # Add first tree
      8 builder_bt.add_tree(trees_bt[0])
----> 9 builder_bt.close()

/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/builder/builder.py in close(self)
    737 
    738     # Should be called last.
--> 739     super(GradientBoostedTreeBuilder, self).close()
    740 
    741   def specialized_header(self) -> Any:

/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/builder/builder.py in close(self)
    500 
    501     for tree in self._trees:
--> 502       self._write_branch(tree.root)
    503     self._trees = []
    504 

/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/builder/builder.py in _write_branch(self, node)
    586 
    587     # Converts the node into a proto node.
--> 588     core_node = py_tree.node.node_to_core_node(node, self.dataspec)
    589 
    590     # Write the node to disk.

/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/py_tree/node.py in node_to_core_node(node, dataspec)
    153     condition_lib.set_core_node(node.condition, dataspec, core_node)
    154     if node.value is not None:
--> 155       value_lib.set_core_node(node.value, core_node)
    156 
    157   elif isinstance(node, LeafNode):

/usr/local/lib/python3.6/site-packages/tensorflow_decision_forests/component/py_tree/value.py in set_core_node(value, core_node)
    154     core_node.regressor.top_value = value.value
    155     if value.standard_deviation is not None:
--> 156       dist = core_node.regressor.dist
    157       dist.count = value.num_examples
    158       dist.sum = 0

AttributeError: dist

I've tested a possible fix for this by changing this line (line 156 above) to dist = core_node.regressor.distribution as used elsewhere in the codebase (see here) and it seems to work, but I'd appreciate the eyes of someone that is more familiar with the code than I am.

It's possible that this hasn't been caught previously as none of the tests here seem to include the standard deviation in the RegressionValue.

arvnds commented 3 years ago

This seems reasonable, thanks for catching this and including the fix! I'll test it and then integrate it into the next release. I'll close this issue when the next release goes out.

achoum commented 3 years ago

The fix was included in the v0.1.9.