Open Realvincentyuan opened 1 year ago
I believe that there could be a fundamental misundestanding.
CART trees are not generated the same way neural networks are; in fact, the two algorithms are fundamentally different.
For neural networks, the (very simplified) process is:
For CART trees, the process is (see Wikipedia for much more details
This means that your assumption that the CARTBuilder
is used to build the shell is incorrect. The CARTBuilder
is an advanced tool to manually build the full model, that is no longer changed by further training. This can be useful e.g. for experimenting with expert-designed trees and for research on decision trees. However, it is not useful for fitting a model to a dataset.
If you want to fit a model to a dataset, just ignore the CARTBuilder, create a model and fit it directly as explained in the beginner tutorial
I believe that there could be a fundamental misundestanding.
CART trees are not generated the same way neural networks are; in fact, the two algorithms are fundamentally different.
For neural networks, the (very simplified) process is:
- Decide the layout of the network
- Feed the examples one-by-one (or in batches) to determine the weights in the network using (some variant of) gradient descent.
- Done
For CART trees, the process is (see Wikipedia for much more details
- Feed the entire dataset to the CART algorithm at once.
- The algorithm builds the "best" tree for the dataset, i.e., the algorithm decides the layout of the tree. There are no weights in the tree, and the algorithm does not use gradient descent.
This means that your assumption that the
CARTBuilder
is used to build the shell is incorrect. TheCARTBuilder
is an advanced tool to manually build the full model, that is no longer changed by further training. This can be useful e.g. for experimenting with expert-designed trees and for research on decision trees. However, it is not useful for fitting a model to a dataset.If you want to fit a model to a dataset, just ignore the CARTBuilder, create a model and fit it directly as explained in the beginner tutorial
Hi @rstz ,
Thank you very much for the explanation, I should have put it this way:
I am aligned with you that the builder is perfect for expert-designed trees
and this is what I am looking for, because I expect to build my tree consisting of the features and thresholds under my control, which helps me build intuitive and compliant rules (without using features violating regulations and laws).
To this end, the CARTBuilder
does address part of the needs, but what I do not follow is the builder requires adding the probability and number of samples
before running any data on the model. How could we know the probability and number of samples before running the model on some dataset?
Also, I do not expect the tree structure to be updated after fitting/prediction, but the probability and samples (which are the results) of each node do not get updated after running the model on some data, this is what I am confused about
. When using the CARTBuilder
, is in fact a way to simply design the expert-designed trees structure without specifying the probability and number of samples
?
Am I missing something?
Thank you for the clarification!
Probability and number of examples are used by the decision tree to output confidence estimates along with the predictions, as well as e.g. tree pruning algorithms. This can be useful for certain applications, but does not impact "raw" model inference. If the tree is fit using the CART algorithm, those are set automatically based on the statistics of the training dataset.
If you're building the tree manually and have no way of computing these values, you may just set them to arbitrary values (e.g. always probability 1 for the class you want, and number_of_samples=10).
If you're building the tree manually and have no way of computing these values, you may just set them to arbitrary values (e.g. always probability 1 for the class you want, and number_of_samples=10).
Hi @rstz,
Thanks for the reply, I want to seek more clarity on the APIs.
Given dataset, mostly we do not know the prob and number of samples
before running the model on any data, like you said, I can set them with arbitrary values, but could those values be updated after running the model on some data by any chance? If yes, that would make a lot of sense and very helpful!
Unfortunately, TF-DF does not offer a specific API for this :(
You're probably able to bootstrap this with a bit of code: Say you have n
leaves in your hand-made tree
predict()
on your dataset. This gives your d * n - dimensional matrix that tells you exactly which example is mapped to which leaf. If you're able to get it to work, we'd be very happy to include this in our examples - just submit a PR!
Unfortunately, TF-DF does not offer a specific API for this :(
You're probably able to bootstrap this with a bit of code: Say you have
n
leaves in your hand-made tree
- Build the tree with the CARTBuilder. For each leaf, assign a different n-dimensional unit vector as its probability.
- Load the tree you built in Step 1 as a model
- Run
predict()
on your dataset. This gives your d * n - dimensional matrix that tells you exactly which example is mapped to which leaf.- Use the matrix to manually compute the correct leaf probabilities and num_example values
- If needed, re-build the tree with the correct leaf probabilities
If you're able to get it to work, we'd be very happy to include this in our examples - just submit a PR!
@rstz
I think what you proprosed makes sense for real business use cases, I am working on this.
On top of that, these a few days I have been thinking of the best practice of this builder
, below workflow would make sense:
tfdf.keras.model
, say tfdf.keras.CartModel
, it returns the tree structure and the real number of samples and probability
sample_tree = inspector.extract_tree(tree_idx=0)
# Create the model builder
model_trial_idx = 1
model_trial_idx += 1
model_path = f"/tmp/manual_model/{model_trial_idx}"
builder = tfdf.builder.CARTBuilder(
path=model_path,
objective=tfdf.py_tree.objective.ClassificationObjective(
label="species", classes=["Adelie", "Gentoo", "Chinstrap"])
)
builder.add_tree(sample_tree)
builder.close()
prob and number of samples
to reflect the literal performance of each node.This makes more sense than building the builder from scratch, as it works better in general on top of a result based on ML model.
Nonetheless, I went through the APIs, looks like some classes like Tree
does not have setter or similar functions, below code snippets threw errors:
sample_tree.root = NonLeafNode(
condition=NumericalHigherThanCondition(
feature=SimpleColumnSpec(name="bill_length_mm", type=ColumnType.NUMERICAL),
threshold=40.0,
missing_evaluation=False),
pos_child=NonLeafNode(
condition=CategoricalIsInCondition(
feature=SimpleColumnSpec(name="island",type=ColumnType.CATEGORICAL),
mask=["Dream", "Torgersen"],
missing_evaluation=False)
,pos_child=LeafNode(value=ProbabilityValue(probability=[0.8, 0.2], num_examples=10))
,neg_child=LeafNode(value=ProbabilityValue(probability=[0.1, 0.9], num_examples=20))
),
neg_child=LeafNode(value=ProbabilityValue(probability=[0.2, 0.8], num_examples=30))
)
error is:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-35-242d44038491>](https://localhost:8080/#) in <cell line: 1>()
----> 1 sample_tree.root = NonLeafNode(
2 condition=NumericalHigherThanCondition(
3 feature=SimpleColumnSpec(name="bill_length_mm", type=ColumnType.NUMERICAL),
4 threshold=40.0,
5 missing_evaluation=False),
AttributeError: can't set attribute 'root'
Also builder class does not have APIs to tweak each node, seemingly, something like below in the phase of using builder will also be helpful.
builder.get_node[node_idx] = NonLeafNode(
condition=CategoricalIsInCondition(
feature=SimpleColumnSpec(name="island",type=ColumnType.CATEGORICAL),
mask=["Dream", "Torgersen"],
missing_evaluation=False)
,pos_child=LeafNode(value=ProbabilityValue(probability=[0.8, 0.2], num_examples=10))
,neg_child=LeafNode(value=ProbabilityValue(probability=[0.1, 0.9], num_examples=20))
)
Perhaps I have missed some APIs, as seems the Python API doc is not exactly aligned with the code. But please advise on my proposed workflow and the questions regarding those setting methods. Appreciate it!
Hi,
thank you for providing additional details. I looked into the tree implementation and it seems like it's just the tree root that does not have a setter method - other properties (e.g. left_child, right_child, value etc.) can be modified through Python:
sample_tree.root.pos_child = LeafNode(value=ProbabilityValue(probability=[0.2, 0.8], num_examples=30))
We might make the root modifiable for the next TF-DF version, but I don't believe that this will be blocking - since each attribute of the root can be modified.
TF-DF does not offer a tree traversal API such as get_node[node_idx]
, you would have to implement that on your own.
Hi @rstz ,
Thanks for the comments, adding a setter method for the root is necessary in some cases. Also, I tried to use your suggested way to adjust nodes, it worked, too.
Nonetheless, I ran into an error that other people also met when trying to call a SavedModel, I added the comments in the issue, #136, too. Just copied the comments here:
In my case, I am building a model using the builder way - add an existing tree to the builder to build a new model, and I can tweak nodes if necessary, the sample code of my workflow is as below:
model = tfdf.keras.CartModel()
model.fit(x=dataset_tf)
inspector = model.make_inspector()
sample_tree = inspector.extract_tree(tree_idx=0)
# Create some alias
Tree = tfdf.py_tree.tree.Tree
SimpleColumnSpec = tfdf.py_tree.dataspec.SimpleColumnSpec
ColumnType = tfdf.py_tree.dataspec.ColumnType
# Nodes
NonLeafNode = tfdf.py_tree.node.NonLeafNode
LeafNode = tfdf.py_tree.node.LeafNode
# Conditions
NumericalHigherThanCondition = tfdf.py_tree.condition.NumericalHigherThanCondition
CategoricalIsInCondition = tfdf.py_tree.condition.CategoricalIsInCondition
# Leaf values
ProbabilityValue = tfdf.py_tree.value.ProbabilityValue
sample_tree = inspector.extract_tree(tree_idx=0)
print(sample_tree)
# Build a model
model_trial_idx = 1
# Create the model builder
model_trial_idx += 1
model_path = f"/tmp/manual_model/{model_trial_idx}"
builder = tfdf.builder.CARTBuilder(
path=model_path,
objective=tfdf.py_tree.objective.ClassificationObjective(
label="species", classes=["Adelie", "Gentoo", "Chinstrap"])
)
builder.add_tree(sample_tree)
builder.close()
manual_model = tf.keras.models.load_model(model_path)
manual_model.predict(dataset_tf)
The error is as below:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-47-d9c72aa7e6da>](https://localhost:8080/#) in <cell line: 1>()
----> 1 manual_model.predict(dataset_tf)
1 frames
[/usr/local/lib/python3.10/dist-packages/keras/engine/training.py](https://localhost:8080/#) in tf__predict_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2169, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2155, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2143, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2111, in predict_step
return self(x, training=False)
File "/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
Positional arguments (2 total):
* {'bill_depth_mm': <tf.Tensor 'inputs_2:0' shape=(None,) dtype=float32>,
'bill_length_mm': <tf.Tensor 'inputs_1:0' shape=(None,) dtype=float32>,
'body_mass_g': <tf.Tensor 'inputs_4:0' shape=(None,) dtype=float32>,
'flipper_length_mm': <tf.Tensor 'inputs_3:0' shape=(None,) dtype=float32>,
'island': <tf.Tensor 'inputs:0' shape=(None,) dtype=string>,
'sex': <tf.Tensor 'inputs_5:0' shape=(None,) dtype=string>,
'year': <tf.Tensor 'inputs_6:0' shape=(None,) dtype=int64>}
* False
Keyword arguments: {}
Expected these arguments to match one of the following 4 option(s):
Option 1:
Positional arguments (2 total):
* {'bill_depth_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='bill_depth_mm'),
'bill_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='bill_length_mm'),
'flipper_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='flipper_length_mm'),
'island': TensorSpec(shape=(None,), dtype=tf.string, name='island')}
* True
Keyword arguments: {}
Option 2:
Positional arguments (2 total):
* {'bill_depth_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='bill_depth_mm'),
'bill_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='bill_length_mm'),
'flipper_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='flipper_length_mm'),
'island': TensorSpec(shape=(None,), dtype=tf.string, name='island')}
* False
Keyword arguments: {}
Option 3:
Positional arguments (2 total):
* {'bill_depth_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='inputs_bill_depth_mm'),
'bill_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='inputs_bill_length_mm'),
'flipper_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='inputs_flipper_length_mm'),
'island': TensorSpec(shape=(None,), dtype=tf.string, name='inputs_island')}
* True
Keyword arguments: {}
Option 4:
Positional arguments (2 total):
* {'bill_depth_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='inputs_bill_depth_mm'),
'bill_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='inputs_bill_length_mm'),
'flipper_length_mm': TensorSpec(shape=(None,), dtype=tf.float32, name='inputs_flipper_length_mm'),
'island': TensorSpec(shape=(None,), dtype=tf.string, name='inputs_island')}
* False
Keyword arguments: {}
Note: the dataset is identical to what is used to train the initial model. So it is very weird that it still ran into error when using the manual model to predict the dataset. This seemes to be issue of TensorFlow and people that used other TF packages also met this issue and the solution is install tf-nightly
: https://github.com/tensorflow/tensorflow/issues/35446#issuecomment-569864556
If this is the case, my version of packages is:
They are compatible but why this issue still remains?
Tl;Dr: You need to manually fix the input signature:
def copy_model_sig(m):
"""
Copy the model signature to a new model.
"""
spec = m.save_spec()[0][0]
return lambda insp: spec
...
builder = tfdf.builder.CARTBuilder(
...
input_signature_example_fn=copy_model_sig(model)
)
Hi,
For some background, a TF-DF model is actually an Yggdrasil Decision Forests (YDF) model that is wrapped in a Keras model. The builder and the inspector both operate on the YDF model, since Keras itself does not have a concept of trees, forests, etc. Unfortunately, some information gets lost between the two formats. We are actively working on improving this, but it requires a major rewrite of parts of TF-DF / YDF.
Here, the TF-DF model does not know what the correct input signature. Instead it just asks the YDF model for its signature and tries to guess a signature from there. Since YDF generally does not distinguish between different integer representations (everything is a float32) and may choose to ignore features that the model does not use, the signature it guesses often does not match what you fed to it through TF-DF.
Looking at this code, it seems useful to integrate it somewhere in the library or, at least, more explicitly in the documentation - I'll think about that
The full, fixed code is:
model = tfdf.keras.CartModel()
model.fit(x=dataset_tf)
inspector = model.make_inspector()
def copy_model_sig(m):
"""
Copy the model signature to a new model.
"""
spec = m.save_spec()[0][0]
return lambda inspector: spec
sample_tree = inspector.extract_tree(tree_idx=0)
# Create some alias
Tree = tfdf.py_tree.tree.Tree
SimpleColumnSpec = tfdf.py_tree.dataspec.SimpleColumnSpec
ColumnType = tfdf.py_tree.dataspec.ColumnType
# Nodes
NonLeafNode = tfdf.py_tree.node.NonLeafNode
LeafNode = tfdf.py_tree.node.LeafNode
# Conditions
NumericalHigherThanCondition = tfdf.py_tree.condition.NumericalHigherThanCondition
CategoricalIsInCondition = tfdf.py_tree.condition.CategoricalIsInCondition
# Leaf values
ProbabilityValue = tfdf.py_tree.value.ProbabilityValue
sample_tree = inspector.extract_tree(tree_idx=0)
print(sample_tree)
# Build a model
model_trial_idx = 1
# Create the model builder
model_trial_idx += 1
model_path = f"/tmp/manual_model/{model_trial_idx}"
builder = tfdf.builder.CARTBuilder(
path=model_path,
objective=tfdf.py_tree.objective.ClassificationObjective(
label="species", classes=["Adelie", "Gentoo", "Chinstrap"]),
input_signature_example_fn=copy_model_sig(model)
)
builder.add_tree(sample_tree)
builder.close()
Note that you can also manually modify the signature if necessary (copied from one of the test):
def custom_model_input_signature(
inspector: inspector_lib.AbstractInspector,
) -> Any:
input_spec = keras.build_default_input_model_signature(inspector)
# Those features are stored as int64 in the dataset.
for feature_name in [
"age",
"capital_gain",
"capital_loss",
"education_num",
"fnlwgt",
"hours_per_week",
]:
input_spec[feature_name] = tf.TensorSpec(shape=[None], dtype=tf.int64)
return input_spec
Hi @rstz ,
Thanks for the explanation and workaround, I appreciate your prompt reply these a few days!
I tested and it worked! This thread is a great discussion, there are some work on my side to address my needs:
number of samples and prob
reflect the literal info after running the model on some datasetpretty
function, basically, flatten the tree-structure and present rules one by one as below(flipper_length_mm >= 206.5; miss=False, score=0.5436033606529236)
├─(pos)─ (bill_depth_mm >= 17.649999618530273; miss=False, score=0.2061920464038849)
│ ├─(pos)─ ProbabilityValue([0.3333333333333333, 0.6666666666666666, 0.0],n=6.0) (idx=4)
│ └─(neg)─ ProbabilityValue([0.0, 0.0, 1.0],n=108.0) (idx=3)
└─(neg)─ (island in ['Biscoe', 'Torgersen']; miss=True, score=0.23399487137794495)
├─(pos)─ ProbabilityValue([0.9767441860465116, 0.0, 0.023255813953488372],n=86.0) (idx=2)
└─(neg)─ (bill_length_mm >= 42.349998474121094; miss=True, score=0.5646106004714966)
├─(pos)─ ProbabilityValue([0.03278688524590164, 0.9672131147540983, 0.0],n=61.0) (idx=1)
└─(neg)─ ProbabilityValue([0.9795918367346939, 0.02040816326530612, 0.0],n=49.0) (idx=0)
to the format below along with some stats of the nodes
flipper_length_mm >= 206.5 and bill_depth_mm >= 17.64999961853027;
flipper_length_mm >= 206.5 and bill_depth_mm < 17.64999961853027
....
Possible follow-up on the TF-DF side like you mentioned:
Again, thanks for your help, great discussion! TFDF long live! 🫡🫡🫡
Update on your prior point, @rstz
Run predict() on your dataset. This gives your d * n - dimensional matrix that tells you exactly which example is mapped to which leaf.
After applying input_signature_example_fn=copy_model_sig(model)
, the predict method runs well, but this only has prob
output, and in fact I have little way to see which node each instance in the dataset goes to with this.
I looked through the API, seems predict_get_leaves
or call_get_leaves
are the ones to call. predict_get_leaves
works well with native CartModel.
Nonetheless, the manual model seems to not be able to call predict_get_leaves
or call_get_leaves
methods,
predict_get_leaves()
method, error is as below. It does not even have warnings when creating the builder, add the sample tree, and closing the builder though---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-38-f1aae9bea4bf>](https://localhost:8080/#) in <cell line: 1>()
----> 1 manual_model.predict_get_leaves(dataset_tf)
AttributeError: 'InferenceCoreModel' object has no attribute 'predict_get_leaves'
call_get_leaves
, the error is as below:---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-39-de7dfe55a161>](https://localhost:8080/#) in <cell line: 1>()
----> 1 manual_model.call_get_leaves(dataset_tf)
1 frames
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/function_deserialization.py](https://localhost:8080/#) in restored_function_body(*args, **kwargs)
259 """Calls a restored function or raises an error if no matching function."""
260 if not saved_function.concrete_functions:
--> 261 raise ValueError("Found zero restored functions for caller function.")
262 # This is the format of function.graph.structured_input_signature. At this
263 # point, the args and kwargs have already been canonicalized.
ValueError: Found zero restored functions for caller function.
Both errors seem to be due to the building saving and model loading process?
Hi,
Unfortunately, TF-DF cannot serialize all its methods within the Keras model format. This means that some functions, including predict_get_leaves
are not available in saved models, and this includes all the models created with the builder. So, there's not way to use these (we really tried to make it work, but couldn't find a satisfactory way for now).
However, my initial proposal was to first create d
artificial output classes (one per leaf) with d-dimensional unit vectors as the probabilities in the builder. This allows a direct mapping from probability to leaf index. Does this work?
Hi @rstz ,
I managed to traverse the tree and get node assignment
, now I am trying to automate some processes to make some methods to update nodes and refresh the true prob and number of samples
programmatically. Though this method likely leads to issues if the probability of prediction in some nodes are identical(say the nodes are pure), which might need additional prob reset.I will bring up pull requests once the test is complete.
One thing I notice that Non-leaf nodes do not have id
while leaf nodes do, I do not quite follow why the inconsistency exists? I need to add additional code to identify non-leaf node or reset the index, this is not a huge effort but just curious.
Hi @Realvincentyuan,
looking forward to your PRs! I don't remember a specific reason for not not including and id for non-leaf nodes - IIRC, there just wasn't a use case for them...
Hi @Realvincentyuan,
looking forward to your PRs! I don't remember a specific reason for not not including and id for non-leaf nodes - IIRC, there just wasn't a use case for them...
Hi @rstz ,
I have created functions for tree traversal, probability & number of samples reset and probability & number of samples refresh after getting the true results. I did not modify anything regarding the native classes that you built, these are mostly independent helper functions.
I am not sure which part to add the functions for PRs, is it contrib or tools? I will document my functions and examples later and create a PR.
├── configure: Project configuration.
├── documentation: User and developer documentation. Contains the colabs.
├── examples: Collection of usage examples.
├── tensorflow_decision_forests: The library
│ ├── component: Utilities.
│ │ ├── builder: Create models "by hand".
│ │ ├── inspector: Inspection of structure and meta-data of models.
│ │ ├── model_plotter: Plotting of model tree structure.
│ │ ├── inspector: Inspection of structure and meta-data of models.
│ │ ├── py_tree: Representation of a decision tree as a python object.
│ │ └── tuner: TF-DF's own hyper-parameter tuner.
│ ├── contrib: Additional functionality outside the project's main scope.
│ ├── keras: Keras logic. Depends on tensorflow logic.
│ │ └── wrapper: Python code generator for Keras models.
│ │── tensorflow: TensorFlow logic.
│ │ └── ops: Custom C++ ops.
│ │ ├── inference: ... for inference.
│ │ └── training: ... for training.
│ └── test_data: Datasets for unit tests and benchmarks.
├── third_party: Bazel configuration for dependencies.
└── tools: Tools for the management of the project and code.
Please note that I do not get a chance to refresh the values of non-leaf node, as the prediction of a model only looks at leaf node and I can only get the leaf assignment for each instance of input dataset. So by far I could not update the non-leaf node of models built by hand.
I feel like in order to update each non-leaf node, it is required to simulate each split of the tree from root to each leaf, and in my case it is not necessary as I only need the literal performance of each leaf, in the PR, I will not include this feature. Any thoughts on this item?
Hi, a subdirectory of contrib
is probably best for this. Let's discuss the other question on the PR directly.
Hi, a subdirectory of
contrib
is probably best for this. Let's discuss the other question on the PR directly.
Hi @rstz , I firstly added a post regarding those helper functions in my blog: https://vincentyuan.us/build-a-decision-tree-by-hand-with-tensorflow/
Please review this first when you get a chance. As you can see,
Looks like it is encouraged to add a test script, while I am not familiar with tf.test.TestCase
, is that mandatory? Give me some time to go through some tutorials of the tf.test.TestCase
if necessary. Otherwise, I might consider contributing to enhance the example post: https://www.tensorflow.org/decision_forests/tutorials/advanced_colab
Hi, sorry for not getting back to you for a while. The article looks good! I wonder if the best strategy would be to just create an "External examples" file that links directly to your (and other) articles?
Hi, sorry for not getting back to you for a while. The article looks good! I wonder if the best strategy would be to just create an "External examples" file that links directly to your (and other) articles?
Hi,
No worries, I have not got a chance to make an exhaustive unit test so far. And I am thinking if possible, I will contribute to the tutorials notebook that you guys made, instead of adding an external link as the URL might get outdated. What do you think?
Expectation
Use the
tfdf.builder.CARTBuilder
to build a decision tree structure and train it with the literal dataset, and optimize the tree structure per the performance.The process is like manually replicate the training process using
tfdf.keras.CartModel
but the benefit is that I can adjust the tree structure per needs, not only focusing on the model performance, which will be helpful ifintuitive rules
are needed.Sample code
I tried to use the
tfdf.builder.CARTBuilder
to build the structure and fit/predict but the results are not as expected as the fitting process does not lead to change of the prediction of leaves.Below are some sample code with a sample dataset running in
Colab
Questions
manual_model
is a completely static model.I assume that the
tfdf.builder.CARTBuilder
is used to build a shell, and the prediction of each node can be reflected after fitting/prediction. I am very confused why it requires me to define the value in the leaf and the value remains the same after fitting/prediction, did I miss anything?What is the best practice to use
tfdf.builder.CARTBuilder
to build a decision tree by hand on earth?Reference: