Rare-first sampling of combinations

outbrain / outrank

A Python library for efficient feature ranking and selection on sparse data sets.

https://dl.acm.org/doi/10.1145/3604915.3610636

BSD 3-Clause "New" or "Revised" License

19 stars 3 forks source link

Rare-first sampling of combinations #39

Closed SkBlaz closed 1 year ago

SkBlaz commented 1 year ago

By default, random subspaces were considered each batch. A more optimal algorithm considers the least sampled combinations each bach, overall increasing the efficiency of sampling (|F| / k (|F|=num features, k = num batches) samples are required to cover all features. The guarantee for uniform sampling is much worse, can be derived from harmonic series actually ->

with