willcrichton / corrset-benchmark

A repository to test different performance optimizations in a sparse matrix computation.
https://willcrichton.net/notes/k-corrset/
42 stars 8 forks source link

possible performance improvement #1

Open toolslive opened 9 months ago

toolslive commented 9 months ago

your current performance bottleneck seems to be iterating bitsets. Maybe try to short cut the iteration via an efficient calculation of the number of trailing zeroes. I just pasted the 64 bit version below (the table needs to be initialized at program start, but the ntz routine is in O(1) ;)

#define DEBRUIJN 0x22fdd63cc95386dULL
static uint32_t table [64] ;
void inittable (){
    uint64_t db = DEBRUIJN ;
    int i ;
    for (i=0 ; i < 64;++ i ) {
        table [db >> 58] = i ;
        db = db <<1;
    }
}

uint32_t ntz ( uint64_t n){
    n &= −n ;
    n ∗= DEBRUIJN ;
    n >>= 58;
    return table [n] ;
}

The explanation is presented here. Of course you can do it for 128 or 256 bits or whatever. It will speed up the iterator a lot if the set is sparse.

Have fun!

willcrichton commented 9 months ago

Hi, thanks for the recommendation. I am already using the number of trailing zeros as an optimization for the bit-set iterator, as you can see here: https://github.com/willcrichton/indexical/blob/main/src/bitset/simd.rs#L210-L219

This is just implemented with the trailing_zeros primitive provided by the Rust standard library: https://doc.rust-lang.org/stable/std/primitive.usize.html#method.trailing_zeros

xmvlad commented 9 months ago

Thanks for you article about performance optimization with Rust! From pure data science viewpoint it can be solved following way: for each user create feature vector F_X -> {0,1} if X_question was answered correctly, then apply linear/logistic regression with heavy L1 regularization. For linear decision functions L1 regularization have property that it's "zero out", not powerful/useful features, depending on value of regularization parameter(so you can easily select k-most powerful). It will be interesting how much it coincide with brute-force approach.