Open moon-chilled opened 1 year ago
eytzinger probably wastes bandwidth if we can do multiple searches in parallel; otoh, if we are bottlenecked by latency, then eytzinger is good because prefetches don't take up rob space
b-tree budgets one gpr per key. Vs 2 with the current scheme, or 2 vectors w/gather (which I'm warming to considering the comparatively better performance on intel & with avx512). Gather exposes more parallelism, but b-tree is more memory-efficient. Not sure which wins, especially in the face of potential partitioning of y. b-tree requires preprocessing though, which is a win for gather at some size classes.
also eytzinger https://arxiv.org/pdf/1509.05053.pdf https://algorithmica.org/en/eytzinger cf