Closed westonpace closed 2 years ago
Ok, I just pushed a few changes that will make this backwards compatible (e.g. not change any of our benchmarks without changing a parameter of some sort).
Though as I was doing this, I wonder if num_groups
is actually the right approach here: all of our writers have chunk_size
where you can tell what size of chunk to make but not how many. Would it make sense to match that instead and have a chunk_size
argument that percolates down to our writers? Would having more chunks than cores (but at least as many chunks as cores) be just as good as having exactly as many chunks as cores?
I guess for the immediate need we have of trying to optimize numbers for queries, we could hardcode these chunk sizes that makes the number of chunks end up equally the number of cores (of course we would also want to factor in scale factor there...)
You are absolutely right wrt chunk_size. I've converted this PR from num_groups to chunk_size.
Thanks for this, this is amazing! I have a few thoughts that I'll push to this branch if you don't mind