jinlow / forust

A lightweight gradient boosted decision tree package.
https://jinlow.github.io/forust/
Apache License 2.0
56 stars 6 forks source link

pivot_on_split O(n) #100

Open deadsoul44 opened 5 months ago

deadsoul44 commented 5 months ago

I guess the following implementation has O(n) complexity. Correct me if I am wrong.

It passes all the tests for pivot_on_split. It became hacky to handle edge cases. If you like, feel free to copy, modify, and merge it.

Link to the playground.

pub fn pivot_with_missing(index: &mut [usize], feature: &[u16], split_value: u16, missing_right: bool) -> usize {
    let length = index.len();
    let mut last_idx = length - 1;
    let mut rv = None;
    for i in 0..length {
        loop {
            match missing_compare(&split_value, feature[index[i]], missing_right) {
                Ordering::Less | Ordering::Equal => {
                    if last_idx <= i {rv = Some(i); break;}
                    index.swap(i, last_idx);
                    if last_idx == 0 {rv = Some(0); break;}
                    last_idx -= 1;
                  },
                Ordering::Greater => break,
            }
        }
        if i >= last_idx {
            break;
        }
    }
    match rv {
        Some(r) => r,
        None => last_idx + 1,
    }
}
jinlow commented 5 months ago

Thanks! How does this have O(n) complexity when there is an inner loop?

deadsoul44 commented 5 months ago

The last_idx is decremented at every swap. The outer loop stops when i >= last_idx. Inspired from: https://stackoverflow.com/questions/69764803/how-to-sort-reorder-a-vec-by-indices

jinlow commented 5 months ago

Ah, that makes sense! I’ll test it out, I believe the current implementation is also O(n), but this seems a little more concise.