Open dominiklohmann opened 10 months ago
In a meeting between @rdettai, @tobim and me, @rdettai brought up the point that making the API return an AsyncGenerator
by default:
generator: AsyncGenerator[Tuple[str, pyarrow.Table]] = pipeline.exec(endpoint="localhost:5000")
will potentially make implementation much more complicated since pyarrow
has no support for async operations. A regular Generator
would already suffice for the purposes of working with the pipeline results in a notebook.
We also agreed that it would probably make sense to add some minimal framing (in particular a length prefix and schema hash) to the planned bitz
wire format, so that we can transport individual table frames without having to rely on the arrow readers.
To enable downstream usage of results of a pipeline in Python, we want to adjust our Python bindings to run any kind of pipeline.
Specifically, we envision three mechanisms here:
For the scope of this roadmap item, we are fine with the restriction that the Python bindings require a
tenzir
binary to be locally available and want to restrict ourselves to mechanism (3).In a meeting between @dominiklohmann, @tobim, @mavam and @lava we agreed on the following pseudocode interface for the bindings: