Open sam-h-bean opened 1 year ago
@sam-h-bean Not to easy, i tried that in a similar project (https://github.com/michaelfeil/infinity) - thats written in pure python, and a bit slower (2.5x less throughput).
Some starting points: Here the codebase is purely in Rust, using tokio for the async stuff, so you might want to launch a server with open port + you need a channel for the grcp things.
to solve your issue:
Feature request
I'd like to use this library for really high throughout ETLs along as an inference server. How I imagine this working is exposing some sort of object which can operate on in-memory datasets.
I am sort of running under the assumption this would be even more performant than native bettertransformer inference in memory.
Motivation
Inspired by vLLMs offering which is great for running LLMs in a big data or ETL setting.
Your contribution
If this is a reasonable first task I would be happy to take a look.