snad-space / coniferest

https://coniferest.snad.space
MIT License
9 stars 3 forks source link

Per-tree `decision_path`/`apply` #134

Closed matwey closed 1 month ago

matwey commented 9 months ago

We need to document (or even implement) a way to do per-tree operations like decision path extraction or leaf index (apply) extractions. Those are required for testing and technique development purposes.

matwey commented 2 months ago

@hombit I believe we need this functional for AAD optimization you like (I mean reducing problem size by excluding inactive leafs): #84

I.e. to detect 'active' (need to be optimized) leafs I would do something like estimator.apply(known_features).reshape(-1) and then have indices map for optimization problem.

matwey commented 2 months ago

I guess that ForestEvaluator is good place to introduce forest apply, returning (n_samples, n_trees).

matwey commented 2 months ago

The algorithm is something like


for x_index in prange(data.shape[0], schedule='static'):
    for tree_index in range(trees):
        tree_offset = indices[tree_index]
        i = 0
        while True:
            selector = selectors[tree_offset + i]
            feature = selector.feature
            if feature < 0:
                break

            if data[x_index, feature] <= selector.value:
                i = selector.left
            else:
                i = selector.right

        paths[x_index, tree_index] = tree_offset + i