Most modern networking is built either on slow and ambiguous REST APIs or unnecessarily complex gRPC. FastAPI, for example, looks very approachable. We aim to be equally or even simpler to use.
FastAPI | UCall |
---|---|
```sh pip install fastapi uvicorn ``` | ```sh pip install ucall ``` |
```python from fastapi import FastAPI import uvicorn server = FastAPI() @server.get('/sum') def sum(a: int, b: int): return a + b uvicorn.run(...) ``` | ```python from ucall.posix import Server # from ucall.uring import Server on 5.19+ server = Server() @server def sum(a: int, b: int): return a + b server.run() ``` |
It takes over a millisecond to handle a trivial FastAPI call on a recent 8-core CPU. In that time, light could have traveled 300 km through optics to the neighboring city or country, in my case. How does UCall compare to FastAPI and gRPC?
Setup | ๐ | Server | Latency w 1 client | Throughput w 32 clients |
---|---|---|---|---|
Fast API over REST | โ | ๐ | 1'203 ฮผs | 3'184 rps |
Fast API over WebSocket | โ | ๐ | 86 ฮผs | 11'356 rps ยน |
gRPC ยฒ | โ | ๐ | 164 ฮผs | 9'849 rps |
UCall with POSIX | โ | C | 62 ฮผs | 79'000 rps |
UCall with io_uring | โ | ๐ | 40 ฮผs | 210'000 rps |
UCall with io_uring | โ | C | 22 ฮผs | 231'000 rps |
How can a tiny pet-project with just a couple thousand lines of code compete with two of the most established networking libraries? UCall stands on the shoulders of Giants:
io_uring
for interrupt-less IO.
io_uring_prep_read_fixed
on 5.1+.io_uring_prep_accept_direct
on 5.19+.io_uring_register_files_sparse
on 5.19+.IORING_SETUP_COOP_TASKRUN
optional on 5.19+.IORING_SETUP_SINGLE_ISSUER
optional on 6.0+.SIMD-accelerated parsers with manual memory control.
simdjson
to parse JSON faster than gRPC can unpack ProtoBuf
.Turbo-Base64
to decode binary values from a Base64
form.picohttpparser
to navigate HTTP headers.You have already seen the latency of the round trip..., the throughput in requests per second..., want to see the bandwidth? Try yourself!
@server
def echo(data: bytes):
return data
FastAPI supports native type, while UCall supports numpy.ndarray
, PIL.Image
and other custom types.
This comes handy when you build real applications or want to deploy Multi-Modal AI, like we do with UForm.
from ucall.rich_posix import Server
import ufrom
server = Server()
model = uform.get_model('unum-cloud/uform-vl-multilingual')
@server
def vectorize(description: str, photo: PIL.Image.Image) -> numpy.ndarray:
image = model.preprocess_image(photo)
tokens = model.preprocess_text(description)
joint_embedding = model.encode_multimodal(image=image, text=tokens)
return joint_embedding.cpu().detach().numpy()
We also have our own optional Client
class that helps with those custom types.
from ucall.client import Client
client = Client()
# Explicit JSON-RPC call:
response = client({
'method': 'vectorize',
'params': {
'description': description,
'image': image,
},
'jsonrpc': '2.0',
'id': 100,
})
# Or the same with syntactic sugar:
response = client.vectorize(description=description, image=image)
Aside from the Python Client
, we provide an easy-to-use Command Line Interface, which comes with pip install ucall
.
It allow you to call a remote server, upload files, with direct support for images and NumPy arrays.
Translating previous example into a Bash script, to call the server on the same machine:
ucall vectorize description='Product description' -i image=./local/path.png
To address a remote server:
ucall vectorize description='Product description' -i image=./local/path.png --uri 0.0.0.0 -p 8545
To print the docs, use ucall -h
:
usage: ucall [-h] [--uri URI] [--port PORT] [-f [FILE ...]] [-i [IMAGE ...]] [--positional [POSITIONAL ...]] method [kwargs ...]
UCall Client CLI
positional arguments:
method method name
kwargs method arguments
options:
-h, --help show this help message and exit
--uri URI server uri
--port PORT server port
-f [FILE ...], --file [FILE ...]
method positional arguments
-i [IMAGE ...], --image [IMAGE ...]
method positional arguments
--positional [POSITIONAL ...]
method positional arguments
You can also explicitly annotate types, to distinguish integers, floats, and strings, to avoid ambiguity.
ucall auth id=256
ucall auth id:int=256
ucall auth id:str=256
We will leave bandwidth measurements to enthusiasts, but will share some more numbers.
The general logic is that you can't squeeze high performance from Free-Tier machines.
Currently AWS provides following options: t2.micro
and t4g.small
, on older Intel and newer Graviton 2 chips.
This library is so fast, that it doesn't need more than 1 core, so you can run a fast server even on a tiny Free-Tier server!
Setup | ๐ | Server | Clients | t2.micro |
t4g.small |
---|---|---|---|---|---|
Fast API over REST | โ | ๐ | 1 | 328 rps | 424 rps |
Fast API over WebSocket | โ | ๐ | 1 | 1'504 rps | 3'051 rps |
gRPC | โ | ๐ | 1 | 1'169 rps | 1'974 rps |
UCall with POSIX | โ | C | 1 | 1'082 rps | 2'438 rps |
UCall with io_uring | โ | C | 1 | - | 5'864 rps |
UCall with POSIX | โ | C | 32 | 3'399 rps | 39'877 rps |
UCall with io_uring | โ | C | 32 | - | 88'455 rps |
In this case, every server was bombarded by requests from 1 or a fleet of 32 other instances in the same availability zone.
If you want to reproduce those benchmarks, check out the sum
examples on GitHub.
For Python:
pip install ucall
For CMake projects:
include(FetchContent)
FetchContent_Declare(
ucall
GIT_REPOSITORY https://github.com/unum-cloud/ucall
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(ucall)
include_directories(${ucall_SOURCE_DIR}/include)
The C usage example is mouthful compared to Python.
We wanted to make it as lightweight as possible and to allow optional arguments without dynamic allocations and named lookups.
So unlike the Python layer, we expect the user to manually extract the arguments from the call context with ucall_param_named_i64()
, and its siblings.
#include <cstdio.h>
#include <ucall/ucall.h>
static void sum(ucall_call_t call, ucall_callback_tag_t) {
int64_t a{}, b{};
char printed_sum[256]{};
bool got_a = ucall_param_named_i64(call, "a", 0, &a);
bool got_b = ucall_param_named_i64(call, "b", 0, &b);
if (!got_a || !got_b)
return ucall_call_reply_error_invalid_params(call);
int len = snprintf(printed_sum, 256, "%ll", a + b);
ucall_call_reply_content(call, printed_sum, len);
}
int main(int argc, char** argv) {
ucall_server_t server{};
ucall_config_t config{};
ucall_init(&config, &server);
ucall_add_procedure(server, "sum", &sum, NULL);
ucall_take_calls(server, 0);
ucall_free(server);
return 0;
}
array
and Pillow serializationWant to affect the roadmap and request a feature? Join the discussions on Discord.