jsoftware / jsource

J engine source mirror
Other
645 stars 91 forks source link

investigate linear and polynomial interpolation, b-trees for I. #151

Open moon-chilled opened 1 year ago

moon-chilled commented 1 year ago

also eytzinger https://arxiv.org/pdf/1509.05053.pdf https://algorithmica.org/en/eytzinger cf

moon-chilled commented 1 year ago

eytzinger probably wastes bandwidth if we can do multiple searches in parallel; otoh, if we are bottlenecked by latency, then eytzinger is good because prefetches don't take up rob space

moon-chilled commented 1 year ago

b-tree budgets one gpr per key. Vs 2 with the current scheme, or 2 vectors w/gather (which I'm warming to considering the comparatively better performance on intel & with avx512). Gather exposes more parallelism, but b-tree is more memory-efficient. Not sure which wins, especially in the face of potential partitioning of y. b-tree requires preprocessing though, which is a win for gather at some size classes.