Serve up owt yer fancy on t'fly.
[[https://github.com/harryaskham/owt/actions/workflows/test.yml][https://github.com/harryaskham/owt/actions/workflows/test.yml/badge.svg]] [[https://github.com/harryaskham/owt/actions/workflows/test_nix.yml][https://github.com/harryaskham/owt/actions/workflows/test_nix.yml/badge.svg]] ** tl;dr
Send the logic for a virtual endpoint along with a request:
URL="$1" read -r -d '' CODE << EOF run = (pipe() .kwarg('path') .open(root_dir='./example/summat') .bytes() .f(lambda ab: ab.decode('utf-8').upper()) .done()) EOF
curl --json $(jo \ code_b64=$(echo "$CODE" | base64 -w 0) \ kwargs_b64=$(echo "{'path': 'simple.sh'}" | base64 -w 0) \ ) $URL
Allowing quick access to arbitrary Python libraries from anywhere you can make an HTTP request:
./example/summat/simple.sh http://localhost:9876/owt
URL="$1" READ -R -D '' CODE << EOF RUN = (PIPE() .KWARG('PATH') .OPEN(ROOT_DIR='./EXAMPLE/SUMMAT') .BYTES() .F(LAMBDA AB: AB.DECODE('UTF-8').UPPER()) .DONE()) EOF
CURL --JSON $(JO \ CODE_B64=$(ECHO "$CODE" | BASE64 -W 0) \ KWARGS_B64=$(ECHO "{'PATH': 'SIMPLE.SH'}" | BASE64 -W 0) \ ) $URL
Most of the examples below show ~owt~ "in the raw", without using the pipelining syntax sugar. They also predate the client implementation, but I'll mostly leave it this way so one gets a sense of how simple things are under the hood. Still, the above is equiavalent to:
read -r -d '' CODE << EOF kwarg('path') .open(root_dir='./example/summat') .bytes() .f(lambda ab: ab.decode('utf-8').upper()) EOF
echo $(python -m owt.client --code "$CODE" --arg path \"simple.sh\" --address http://localhost:9876/owt)
** What's This?
A server an endpoint whose request handling behaviour is configured by the request itself.
For example, start by launching ~owt~:
$ python -m owt.server
Owt serving on 0.0.0.0:9876
+end_src
Now you can call ~localhost:9876/arbitrary/path~ with any behaviour via a thin Python adaptor:
from big.research.lab import fancy_ai
def hard_task(prompt): return fancy_ai.do_something(prompt)
Which can be called from any language with a tiny client:
-- owt.hs
main :: IO () main = mkOwtClient "http://localhost:9876/owt"
= owt @POST "f(sota.hard_task)" "solve AGI" = putBS
+end_src
$ runhaskell owt.hs
"AGI solved!"
+end_src
The client-free version isn't very complex either:
curl -G \ --data-urlencode code_b64=$(cat adaptor.py | base64 -w 0) \ --data-urlencode kwargs_b64=$(echo "{'prompt':'solve AGI'}") \ http://localhost:9876
** Security This works by evaluating arbitrary user-supplied strings as code. This is grossly insecure by its very nature; don't expose it to the internet!*
There's support for basic auth via ~owt --auth="username:sha256(password)"~ but still, exercise caution. It would not be difficult to accidentally make an ~owt~ API call that irreversibly destroyed your own machine. *** Why?
The primary motivation is rapid prototyping against new machine learning models from languages other than Python. So often the newest hotness drops with a checkpoint and a Python client, and using this outside of the specific ecosystem in which it is designed to run means one of:
~owt~ aims to make #4 less painful, at the cost of security and sanity, by providing a single service capable of serving arbitrary new logic without requiring changes to ~owt~ itself. *** How? The cost of supporting some new system is pushed to the client, who must send adaptor code (a request handler, essentially) along with the request. This creates a virtual endpoint on the fly, giving complete control over the serving logic to the API user.
Writing the adaptor code is a one-time cost for each new virtual endpoint, made cheaper by having access to ~owt.summat~, a collection of composable building blocks.
The imports used by an adaptor have to be available in ~owt~'s ~$PYTHONPATH~. For this ~owt~ can be started inside a virtualenv, after installing a package globally, or just from within some project directory. *** Examples
The following examples (in ~example/~) could all be run one-by-one without any need to restart or rebuild ~owt~. The first one is shown a few different ways to give a flavour of usage. Subsequent examples just show ~curl~ in tandem with an adaptor ~.py~ file, but it's easy to see how one could extend from here to call to ~owt~ from any other language. ** Echo Server By default, the function named ~run(request, kwargs)~ in the user-supplied code will be used to handle the request. Code and (optionally) arguments are supplied as ~code_b64~ and ~kwargs_b64~. ~kwargs_b64~ is ~eval~'d to get the resulting dictionary, so can itself contain richer logic to build up arguments. ***** As a self-contained shell script
URL="$1" read -r -d '' CODE << EOF def run(name: str): return f'Hello, {name}!' EOF curl --json $(jo \ code_b64=$(echo "$CODE" | base64 -w 0) \ kwargs_b64=$(echo "{'name': 'owt'}" | base64 -w 0) \ ) $URL/hello
./example/echo/echo_script.sh http://localhost:9876
: Hello, owt! ***** As a Python file + script
from owt.server import Server
def run(name=None): return f"Hello, {name}, from {Server.sing().address}:{Server.sing().port}!"
Passing data via POST JSON ~kwargs~:
URL="$1" CODE_B64=$(cat example/echo/echo.py | base64 -w 0) KWARGS_B64=$(echo "{'name': 'owt'}" | base64 -w 0) curl --json $(jo code_b64=$CODE_B64 kwargs_b64=$KWARGS_B64) $URL/hello
./example/echo/echo_kwargs.sh http://localhost:9876
: Hello, owt, from 0.0.0.0:9876!
Passing data via GET in the path:
URL="$1" CODE_B64=$(cat example/echo/echo.py | base64 -w 0) KWARGS_B64=$(echo "{'name': 'owt'}" | base64 -w 0) curl -G --data-urlencode code_b64=$CODE_B64 --data-urlencode kwargs_b64=$KWARGS_B64 $URL
./example/echo/echo_request.sh http://localhost:9876
: Hello, owt, from 0.0.0.0:9876! **** Text to Speech API A more complex example demonstrating wrapping Suno's OSS TTS model [[https://github.com/suno-ai/bark]].
The client provides an adaptor that responds with a stream of bytes, allowing the generated audio to be streamed in chunks, sentence-by-sentence.
Responses are cached for the lifetime of the ~owt~ server for each combination of ~(text, speaker)~.
The ~preload_models()~ call makes the first call take a while as VRAM is populated, but the weights remain in memory so subsequent calls are cheaper.
To avoid this breaking other ~owt~ uses, one can spin up multiple instances of ~owt~, each handling a different kind of task and with different resource profiles. ***** Python Adaptor The endpoint logic, to be base64-encoded as part of the request.
from typing import Literal
def run( text: str = "", speaker: str = "v2/en_speaker_6", sentence_template: str = "%s", split_type: Literal["sentence", "none"] = "sentence", model_size: Literal["small", "large"] = "small", ): import os import logging import json import io import nltk # type: ignore import numpy as np import base64 from scipy.io.wavfile import write as write_wav # type: ignore
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["SUNO_OFFLOAD_CPU"] = "0"
match model_size:
case "small":
os.environ["SUNO_USE_SMALL_MODELS"] = "1"
case "large":
os.environ["SUNO_USE_SMALL_MODELS"] = "0"
from bark.generation import generate_text_semantic, preload_models, SAMPLE_RATE # type: ignore
from bark.api import semantic_to_waveform # type: ignore
preload_models()
def base64_wav(arr):
buf = io.BytesIO()
write_wav(buf, SAMPLE_RATE, arr)
wav = buf.getvalue()
return base64.b64encode(wav).decode("utf-8")
def generate():
clean_text = text.replace("\n", " ").strip()
match split_type:
case "sentence":
sentences = nltk.sent_tokenize(clean_text)
case "none":
sentences = [clean_text]
full_wav_array: np.ndarray | None = None
for i, raw_sentence in enumerate(sentences):
sentence = sentence_template % raw_sentence
logging.info(
"Generating sentence %d/%d: %s", i + 1, len(sentences), sentence
)
semantic_tokens: np.ndarray = generate_text_semantic(
sentence,
history_prompt=speaker,
temp=0.6,
min_eos_p=0.05,
)
wav_array: np.ndarray = semantic_to_waveform(
semantic_tokens, history_prompt=speaker
)
full_wav_array = (
wav_array
if full_wav_array is None
else np.concatenate((full_wav_array, wav_array))
)
yield "data: %s\n\n" % (
json.dumps(
{
"sentence": base64_wav(wav_array),
"cumulative": base64_wav(full_wav_array),
}
)
)
yield "data: [DONE]\n\n"
return generate(), {"Content-Type": "text/event-stream"}
***** Save audio via cURL Bundle the endpoint logic with a prompt and download the resulting audio.
#
URL="$1" TEXT="$2" OUTDIR="$3"
CODE_B64="$(cat owt/lib/bark.py | base64 -w 0)" KWARGS_B64=$(echo "{\"text\": \"$TEXT\"}" | base64 -w 0) JSON=$(jo \ code_b64=$CODE_B64 \ kwargs_b64=$KWARGS_B64 \ use_cache="true" \ cache_kwargs="true" \ ) CMD="curl --json $JSON $URL"
echo "Running $CMD" I=0 for event in "$($CMD)"; do if [[ "$event" == "data: {"* ]]; then WAV="$(echo -n "$event" | sed 's/data: //g' | jq '.sentence' | base64 -d)" echo "$WAV" > "$OUTDIR/sentence$I.wav" I=$((I+1)) fi done
***** Stream audio via JS Use an endpoint from a webapp - see ~example/bark/bark.html~ for usage.
function makeRequest(code, text, speaker, sentenceTemplate, splitType) { return { 'code_b64': btoa(code), 'kwargs_b64': btoa(JSON.stringify({ 'text': text.replace(/\n/g, '\n'), 'speaker': speaker, 'sentence_template': sentenceTemplate, 'split_type': splitType })), }; }
function audioUrl( url, code, text, speaker, sentenceTemplate, splitType) { const request = makeRequest( code, text, speaker, sentenceTemplate, splitType); return url + '?' + $.param(request); }
async function getAudio( url, code, text, speaker, sentenceTemplate, splitType, onChunk) { const source = new EventSource( audioUrl(url, code, text, speaker, sentenceTemplate, splitType)); source.onmessage = function(event) { if (event.data.toLowerCase() == 'done') { source.close(); return; } const chunk = JSON.parse(event.data); onChunk(chunk); } }
***** Ad-hoc Web Server In fact we can go one step further now and bootstrap our own webserver within ~owt~ to serve our prototype app.
We ca create an adhoc endpoint that serves us the rendered ~bark.html~ Jinja2 template.
The ~owt~ arguments can be passed as GET query parameters as well as POST JSON data, so we can actually write a handler that embeds the entire HTML into the query with this Python-in-Python-in-Bash curiosity.
URL="$1" CODE=$(python <<EOF with open('example/bark/bark.html', 'r') as html_f: html = html_f.read() with open('owt/lib/bark.py', 'r') as code_f: code = code_f.read() with open('example/bark/bark.js', 'r') as js_f: template = (html.replace('{% include "bark.py" %}', code) .replace('', '')) print(''' def run(): return (r\'\'\' '''+template+''' \'\'\')''') EOF ) CODE_B64=$(base64 -w 0 <<< "$CODE") echo "curl -G --data-urlencode \"code_b64=$CODE_B64\" $URL"
bash -c "$(./example/bark/bark_construct_curl.sh http://localhost:9876) -s -o /dev/null -w '%{url}'"
Whew. We can open that ~owt~ URL in a browser and play with the web app, which itself makes calls to ~owt~ injecting the TTS logic. ***** Going Meta
That gets painful though - for iterative development, you want to save your code and hit refresh. This won't do anything here, since all code is snapshotted into the URL itself. However...
URL="$1" read -r -d '' CODE << 'EOF' def run(base_url): import os html = os.popen(f'bash -c "$(./example/bark/bark_construct_curl.sh {base_url})"').read() return html EOF
KWARGS="{\"base_url\": \"$URL\"}" CODE_B64=$(base64 -w 0 <<< "$CODE") KWARGS_B64=$(base64 -w 0 <<< "$KWARGS") echo "curl -G --data-urlencode code_b64=$CODE_B64 --data-urlencode kwargs_b64=$KWARGS_B64 $URL"
./example/bark/bark_meta_curl.sh http://localhost:9876
: curl -G --data-urlencode code_b64=ZGVmIHJ1bihiYXNlX3VybCk6CiAgaW1wb3J0IG9zCiAgaHRtbCA9IG9zLnBvcGVuKGYnYmFzaCAtYyAiJCguL2V4YW1wbGUvYmFyay9iYXJrX2NvbnN0cnVjdF9jdXJsLnNoIHtiYXNlX3VybH0pIicpLnJlYWQoKQogIHJldHVybiBodG1sCg== --data-urlencode kwargs_b64=eyJiYXNlX3VybCI6ICJodHRwOi8vbG9jYWxob3N0Ojk4NzYifQo= http://localhost:9876
Sweet - this will resolve to the meta-evaluator that always renders a fresh copy of the app each time.
bash -c "$(./example/bark/bark_meta_curl.sh http://localhost:9876) -s -o /dev/null -w '%{url}'"
So what would stop you hosting the entirey of ~owt~ on... wait, no...
function owtInOwt() { URL_PORT="$1" PAYLOAD_CODE_B64="$2" PAYLOAD_KWARGS_B64="$3" read -r -d '' CODE << EOF def run(**kwargs): payload_code_b64 = kwargs['payload_code_b64'] payload_kwargs_b64 = kwargs['payload_kwargs_b64'] global _SERVER new_port = _SERVER.port + 1 _globals = {'name': name+'_new', 'new_port': new_port} _locals = {} print('going one level down to port %s' % new_port)
exec(''' print('One level deeper, importing owt') from owt import server from multiprocessing import Process server_thread = Process(target=server.main, kwargs={"port": new_port}) ''', _globals, _locals)
def kill(): import time time.sleep(5) print('Killing server on %s' % Server.sing().port) _locals['server_thread'].terminate() print('Killed server on %d' % Server.sing().port)
from multiprocessing import Process from flask import request import requests import urllib
_locals['server_thread'].start() port = urllib.parse.urlparse("$URL_PORT").port new_url = "$URL_PORT".replace(str(port), str(new_port)) bootstrapped_url = f"{new_url}?code_b64={urllib.parse.quote_plus(payload_code_b64)}&kwargs_b64={urllib.parse.quote_plus(payload_kwargs_b64)}" resp = requests.get(bootstrapped_url).content Process(target=kill).start() return resp EOF
CODE_B64=$(base64 -w 0 <<< "$CODE") KWARGS_B64=$(base64 -w 0 <<< "{\"payload_code_b64\":\"$PAYLOAD_CODE_B64\", \"payload_kwargs_b64\": \"$PAYLOAD_KWARGS_B64\"}") CMD="curl -G --data-urlencode code_b64=$CODE_B64 --data-urlencode kwargs_b64=$KWARGS_B64 $URL_PORT" echo $CMD }
Oh no, no...
source "example/meta/bootstrap.sh"
CODE_B64=$(cat example/echo/echo.py | base64 -w 0) KWARGS_B64=$(echo "{'name': 'owt-inside-owt'}" | base64 -w 0)
CMD=$(owtInOwt "http://localhost:9876/owt" "$CODE_B64" "$KWARGS_B64") echo "Running: $CMD" echo "Result:" bash -c "$CMD"
: Running: curl -G --data-urlencode code_b64=ZGVmIHJ1bigqKmt3YXJncyk6CiAgcGF5bG9hZF9jb2RlX2I2NCA9IGt3YXJnc1sncGF5bG9hZF9jb2RlX2I2NCddCiAgcGF5bG9hZF9rd2FyZ3NfYjY0ID0ga3dhcmdzWydwYXlsb2FkX2t3YXJnc19iNjQnXQogIGdsb2JhbCBfU0VSVkVSCiAgbmV3X3BvcnQgPSBfU0VSVkVSLnBvcnQgKyAxCiAgX2dsb2JhbHMgPSB7J19fbmFtZV9fJzogX19uYW1lX18rJ19uZXcnLAogICAgICAgICAgICAgICduZXdfcG9ydCc6IG5ld19wb3J0fQogIF9sb2NhbHMgPSB7fQogIHByaW50KCdnb2luZyBvbmUgbGV2ZWwgZG93biB0byBwb3J0ICVzJyAlIG5ld19wb3J0KQoKICBleGVjKCcnJwpwcmludCgnT25lIGxldmVsIGRlZXBlciwgaW1wb3J0aW5nIG93dCcpCmZyb20gb3d0IGltcG9ydCBzZXJ2ZXIKZnJvbSBtdWx0aXByb2Nlc3NpbmcgaW1wb3J0IFByb2Nlc3MKc2VydmVyX3RocmVhZCA9IFByb2Nlc3ModGFyZ2V0PXNlcnZlci5tYWluLCBrd2FyZ3M9eyJwb3J0IjogbmV3X3BvcnR9KQonJycsIF9nbG9iYWxzLCBfbG9jYWxzKQoKICBkZWYga2lsbCgpOgogICAgaW1wb3J0IHRpbWUKICAgIHRpbWUuc2xlZXAoNSkKICAgIHByaW50KCdLaWxsaW5nIHNlcnZlciBvbiAlcycgJSBTZXJ2ZXIuc2luZygpLnBvcnQpCiAgICBfbG9jYWxzWydzZXJ2ZXJfdGhyZWFkJ10udGVybWluYXRlKCkKICAgIHByaW50KCdLaWxsZWQgc2VydmVyIG9uICVkJyAlIFNlcnZlci5zaW5nKCkucG9ydCkKCiAgZnJvbSBtdWx0aXByb2Nlc3NpbmcgaW1wb3J0IFByb2Nlc3MKICBmcm9tIGZsYXNrIGltcG9ydCByZXF1ZXN0CiAgaW1wb3J0IHJlcXVlc3RzCiAgaW1wb3J0IHVybGxpYgoKICBfbG9jYWxzWydzZXJ2ZXJfdGhyZWFkJ10uc3RhcnQoKQogIHBvcnQgPSB1cmxsaWIucGFyc2UudXJscGFyc2UoImh0dHA6Ly9sb2NhbGhvc3Q6OTg3Ni9vd3QiKS5wb3J0CiAgbmV3X3VybCA9ICJodHRwOi8vbG9jYWxob3N0Ojk4NzYvb3d0Ii5yZXBsYWNlKHN0cihwb3J0KSwgc3RyKG5ld19wb3J0KSkKICBib290c3RyYXBwZWRfdXJsID0gZiJ7bmV3X3VybH0/Y29kZV9iNjQ9e3VybGxpYi5wYXJzZS5xdW90ZV9wbHVzKHBheWxvYWRfY29kZV9iNjQpfSZrd2FyZ3NfYjY0PXt1cmxsaWIucGFyc2UucXVvdGVfcGx1cyhwYXlsb2FkX2t3YXJnc19iNjQpfSIKICByZXNwID0gcmVxdWVzdHMuZ2V0KGJvb3RzdHJhcHBlZF91cmwpLmNvbnRlbnQKICBQcm9jZXNzKHRhcmdldD1raWxsKS5zdGFydCgpCiAgcmV0dXJuIHJlc3AK --data-urlencode kwargs_b64=eyJwYXlsb2FkX2NvZGVfYjY0IjoiSXlCbGVHRnRjR3hsTDJWamFHOHZaV05vYnk1d2VRb0tabkp2YlNCdmQzUXVjMlZ5ZG1WeUlHbHRjRzl5ZENCVFpYSjJaWElLQ2dwa1pXWWdjblZ1S0c1aGJXVTlUbTl1WlNrNkNpQWdJQ0J5WlhSMWNtNGdaaUpJWld4c2J5d2dlMjVoYldWOUxDQm1jbTl0SUh0VFpYSjJaWEl1YzJsdVp5Z3BMbUZrWkhKbGMzTjlPbnRUWlhKMlpYSXVjMmx1WnlncExuQnZjblI5SVNJSyIsICJwYXlsb2FkX2t3YXJnc19iNjQiOiAiZXlkdVlXMWxKem9nSjI5M2RDMXBibk5wWkdVdGIzZDBKMzBLIn0K http://localhost:9876/owt : Result: : Hello, owt-inside-owt, from 0.0.0.0:9877!
Oh no... but that would mean you could... I wonder...
def run(**kwargs): from flask import request import os
payload_code_b64 = kwargs["payload_code_b64"]
payload_kwargs_b64 = kwargs["payload_kwargs_b64"]
print(request.base_url)
return os.popen(
f'source ./example/meta/bootstrap.sh; $(owtInOwt "{request.base_url}" "{payload_code_b64}" "{payload_kwargs_b64}")'
).read()
Someone please call Douglas Hofstadter
METACODE_B64=$(cat example/meta/bootstrap.py | base64 -w 0) function wrapOwt() { CODE_B64="$1" KWARGS_B64="$2" METAKWARGS_B64=$(base64 -w 0 <<< "{\"payload_code_b64\":\"$CODE_B64\", \"payload_kwargs_b64\": \"$KWARGS_B64\"}") echo "$METAKWARGS_B64" }
N_LAYERS=5 for layer in $(seq 1 $N_LAYERS); do CODE_B64=$(cat example/echo/echo.py | base64 -w 0) NAME="owt" for i in $(seq 1 $layer); do NAME="$NAME-inside-owt" done KWARGS_B64=$(echo "{\"name\": \"$NAME\"}" | base64 -w 0) METAKWARGS_B64=$(wrapOwt "$CODE_B64" "$KWARGS_B64") for i in $(seq 2 $layer); do METAKWARGS_B64=$(wrapOwt "$METACODE_B64" "$METAKWARGS_B64") done echo "layer: $NAME" CMD="curl -G --data-urlencode code_b64=$METACODE_B64 --data-urlencode kwargs_b64=$METAKWARGS_B64 http://localhost:9876/owt" echo "Result: " $(bash -c "$CMD") sleep 1 done
layer: owt-inside-owt Result: Hello, owt-inside-owt, from 0.0.0.0:9877! layer: owt-inside-owt-inside-owt Result: Hello, owt-inside-owt-inside-owt, from 0.0.0.0:9878! layer: owt-inside-owt-inside-owt-inside-owt Result: Hello, owt-inside-owt-inside-owt-inside-owt, from 0.0.0.0:9879! layer: owt-inside-owt-inside-owt-inside-owt-inside-owt Result: Hello, owt-inside-owt-inside-owt-inside-owt-inside-owt, from 0.0.0.0:9880! layer: owt-inside-owt-inside-owt-inside-owt-inside-owt-inside-owt Result: Hello, owt-inside-owt-inside-owt-inside-owt-inside-owt-inside-owt, from 0.0.0.0:9881!
Hoo boy. How is Python a real language. ** TODO