man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.51k stars 93 forks source link

Fully parallelise `batch_read` when 1 or more of the symbols requested is recursively normalised #1968

Open alexowens90 opened 3 weeks ago

alexowens90 commented 3 weeks ago

Currently, the code flow when reading and batch reading pointlessly pivots back off the Python layer in the middle of the call, before calling batch_read_keys. This is not too inefficient in the non-batch case, as batch_read_keys is parallelised. However, in batch_read, the calls to batch_read_keys are called sequentially for each symbol in the batch that is recursively normalized, and so this will be very inefficient if there are many such small symbols.

There is no conceptual need to pivot off Python mid-call, this could all be handled in the C++ layer with appropriate structures returned to Python.