Closed ablaom closed 1 year ago
It seems to me, that the renaming of the named parameter for feature names of wrap
which @sylvaticus introduced with commit 86a003f is causing some confusion. In DecisionTree.jl
the named parameter for this purpose is called featurenames
.
In BetaML
it got somehow feature_names
and then with the above mentioned commit features_names
(additional s). But the documentation for wrap
still says featurenames
and in the example above feature_names
is used. I.e. the InfoNode
created by wrap
in the example has the list of names in an attribute called feature_names
, but printnode
is looking for an attribute called features_names
.
So we have every possible combination and a bit of a chaos 😀.
My suggestion is to go back to featurenames
, in order to be consistent with 'DecsionTrees.jl' (and the documentation).
Sorry, I have missed the original comment notification. I'll gonna look on this tomorrow .....
@roland-KA Thanks for for looking into this and for the diagnosis.
Thanks @ablaom for reporting and @roland-KA for the deep research of the cause of the issue. I followed your suggestion and just reset it to "featurenames". This should be in the newly released v0.9.6
As this issue shows, it is quite easy to run into trouble when using wrap
, I'm thinking about adding a parameter check to each wrap
implementation that verifies that only the keywords featurenames
and classnames
are used. It could throw an ArgumentError
, if something is wrong.
@ablaom , @sylvaticus What's your opinion about this?
Sound like a good idea.
Mmm. I'm still pretty confused. Now I don't get any nice print out at all, just this:
julia> wrapped_tree = Trees.wrap(raw_tree, (featurenames=DF.names(X),))
A wrapped Decision Tree
Same if I use feature_names
.
I understood that the wrap
function was intender for plotting only, not for printing.
The decision tree is already plot in full (but without feature names) when the DecisionTreeEstimator
is explicitly printed, but I may have misunderstood the needs. If there is a need to get the tree printed other than plotted, perhaps at this time it is better if I add another parameter featurenames
directly in the estimator constructor... what do you think ?
@sylvaticus you are right, the wrap
-function was intended for plotting only. But the plot recipe uses also AbstractTrees.printnode
(which is implemented together which each wrap
-version). And AbstractTrees.print_tree
-function is based on printnode
. So it is also possible to print a text-based version of the tree using print_tree
.
Mmm. I'm still pretty confused. Now I don't get any nice print out at all, just this:
julia> wrapped_tree = Trees.wrap(raw_tree, (featurenames=DF.names(X),)) A wrapped Decision Tree
Same if I use
feature_names
.
@ablaom How did you print the text-based version? Using AbstractTrees.print_tree
?
show
doesn't use the wrap
-logic; so just print
ing the wrapped_tree
won't show the feature names.
To extend the answer of @roland-KA , this works:
julia> using BetaML
julia> X = [1.8 2.5; 0.5 20.5; 0.6 18; 0.7 22.8; 0.4 31; 1.7 3.7];
julia> y = 2 .* X[:,1] .- X[:,2] .+ 3;
julia> mod = DecisionTreeEstimator(max_depth=10)
DecisionTreeEstimator - A Decision Tree model (unfitted)
julia> ŷ = fit!(mod,X,y);
julia> hcat(y,ŷ)
6×2 Matrix{Float64}:
4.1 3.4
-16.5 -17.45
-13.8 -13.8
-18.4 -17.45
-27.2 -27.2
2.7 3.4
julia> println(mod)
DecisionTreeEstimator - A Decision Tree regressor (fitted on 6 records)
Dict{String, Any}("job_is_regression" => 1, "fitted_records" => 6, "max_reached_depth" => 4, "avg_depth" => 3.25, "xndims" => 2)
*** Printing Decision Tree: ***
1. BetaML.Trees.Question{Float64}(2, 18.0)
--> True :
1.2. BetaML.Trees.Question{Float64}(2, 31.0)
--> True : -27.2
--> False:
1.2.3. BetaML.Trees.Question{Float64}(2, 20.5)
--> True : -17.450000000000003
--> False: -13.8
--> False: 3.3999999999999995
julia> wmod = wrap(mod,featurenames=["dim1","dim2"])
A wrapped Decision Tree
julia> import AbstractTrees:print_tree
julia> print_tree(wmod)
dim2 >= 18.0?
├─ dim2 >= 31.0?
│ ├─ -27.2
│ │
│ └─ dim2 >= 20.5?
│ ├─ -17.450000000000003
│ │
│ └─ -13.8
│
└─ 3.3999999999999995
(I modified the docstring to consider print_tree
)
@sylvaticus @roland-KA Thanks for the detailed explanations. I must have been sloppy with my first post and dropped the print_tree
. I apologise for not checking this more carefully - very bad form.
No problem, we are here to clarify and explain things 🤓
What am I missing here?
cc @roland-KA