High-performance recommender output storage

Right now, in experiments I have been running, there is a significant bottleneck in retrieving and saving results in parallel batch inference. This is significantly hindering throughput, as each worker is only able to run at 30-40% of a CPU on my large data-crunching rig.

It is possible that item lists will speed this up, but if not, I would like to look at a more efficient way to collect batch-inference results for saving and/or measurement.

One potential solution is to save each worker's results in a separate Parquet file.

Another promising direction is Arrow Flight, an IPC protocol built on top of Arrow. ItemList can be trivially converted to an Arrow Table, which then can be serialized into a flight. We could implement a Flight server, in either Python or Rust, that processes item lists and incorporates them into the results.

Some open questions:

Does Python support concurrent flights Flight server? Or does one client running do_put block other clients?
Do we need Rust, or will Python be sufficiently performant?

lenskit / lkpy

High-performance recommender output storage #495