Open mmore500 opened 19 hours ago
perf on 128 core node (about 2 minutes for 1 billion rows)
2024-12-02 20:32:51 - INFO - collecting
2024-12-02 20:33:56 - INFO - collected
2024-12-02 20:33:56 - INFO - concatenated
[[ 19 16 15 ... -44 -45 -46]
[ 19 16 15 ... -44 -45 -46]
[ 19 16 15 ... -44 -45 -46]
...
[ 19 16 15 ... -44 -45 -46]
[ 19 16 15 ... -44 -45 -46]
[ 19 16 15 ... -44 -45 -46]]
2024-12-02 20:34:31 - INFO - complete
idea for optimizing the big upstream bottleneck in downstream package’s index lookup; right now we’re using numpy for the operations with numba for parallelism (which is doing a bad job at it and has a big jit cost)
Here’s an example of the focal code: it’s just a bunch of operations on/between 1d arrays https://github.com/mmore500/downstream/blob/python/downstream/dstream/steady_algo/_steady_lookup_ingest_times_batched.py
The clever alternate plan is to replace all of these numpy vector operations with operations on/between columns in a polars lazy data frame
we can use the following pattern to then force polars to parallelize our computation on that one big lazy frame row-wise
essentially, we pass a list of chunks in and then polars will collect all the chunks in parallel
rough sketch