Closed parrt closed 1 year ago
@mepland could you check my work? I believe it all comes down to the tessellate function. This code looks right that recursive through the tree, ignoring any notes that are not one of the two features:
if t.feature() == featidx[0]:
walk(t.left, (bbox[0], bbox[1], s, bbox[3]))
walk(t.right, (s, bbox[1], bbox[2], bbox[3]))
elif t.feature() == featidx[1]:
walk(t.left, (bbox[0], bbox[1], bbox[2], s))
walk(t.right, (bbox[0], s, bbox[2], bbox[3]))
else:
walk(t.left, bbox)
walk(t.right, bbox)
Ah. It might be related to the fact that when I reach a leaf I record whatever the bounding boxes for that recursive invitation, but what if no features of interest were ever tested to get to that leaf? That would indicate that we are adding bounding boxes for regions that are not associated with these two features of interest.
if t.isleaf():
bboxes.append((t, bbox))
return
Anyway somehow we are adding too many bounding box regions.
Here's what the tree looks like:
Ah. It is more obvious with a shorter tree. There are four leaves in the tree:
And we are seeing four regions:
But, we are asking for features WGT and CYL, but the entire right side of the tree does not test WGT or CYL. So when we reach a leaf, those are getting added even though they are not relevant to this two dimensional feature space.
Fixed by 808dbf4. @mepland I realized that we simply have to avoid the case where no features of interest are tested. The case where one feature of interest is tested still must give a representation in the partitioning. And it is totally possible for regions to overlap just like any marginal plot. The other variables would explain how to disambiguate the overlap.
Something's not right with my implementation that displays the feature space partitioning for two features from a fully populated model. Previously we required the user Strip down a model that was trained only on those two features. I tried to make the tessellate() function ignore split notes that we're not associated with one of the two variables, but I think somethings not right. Looking down from above, nothing should overlap because otherwise the same x,y coordinate predicts more than one z (regressor target) value. E.g.,