Open toolslive opened 9 months ago
Hi, thanks for the recommendation. I am already using the number of trailing zeros as an optimization for the bit-set iterator, as you can see here: https://github.com/willcrichton/indexical/blob/main/src/bitset/simd.rs#L210-L219
This is just implemented with the trailing_zeros
primitive provided by the Rust standard library: https://doc.rust-lang.org/stable/std/primitive.usize.html#method.trailing_zeros
Thanks for you article about performance optimization with Rust! From pure data science viewpoint it can be solved following way: for each user create feature vector F_X -> {0,1} if X_question was answered correctly, then apply linear/logistic regression with heavy L1 regularization. For linear decision functions L1 regularization have property that it's "zero out", not powerful/useful features, depending on value of regularization parameter(so you can easily select k-most powerful). It will be interesting how much it coincide with brute-force approach.
your current performance bottleneck seems to be iterating bitsets. Maybe try to short cut the iteration via an efficient calculation of the number of trailing zeroes. I just pasted the 64 bit version below (the table needs to be initialized at program start, but the
ntz
routine is in O(1) ;)The explanation is presented here. Of course you can do it for 128 or 256 bits or whatever. It will speed up the iterator a lot if the set is sparse.
Have fun!