Closed lilnasy closed 1 year ago
@jeff-hykin I want to know what you think about this
I was actually thinking about this earlier this morning.
This feature seems like a good way to avoid the complexity of having a class-method based encoding/decoding.
There are few problems I see though.
setTag
and getTag
. For getTag, do getUint8, and if the last bit of the uint8 equals 1 then, get another uint8 and keep repeating until the last bit isn't 1. Then it's easy to combine the other bits into a number; I've got a handy function in my deno binaryify module that does this in both directions. E.g. It can take a uint8 and return an "escaped" uint8, where every 7 bits, a "spare" bit is added for encoding reasons. (e.g. 11111111_11111111
=> 11111110_11111110_11000000
as well as the "unescape" direction 11111110_11111110_11000000
=>11111111_11111111
) This is the revised API:
const { encode, decode } =
EsCodec.createCodec([
{
name: "URL",
when (x : unknown) : x is URL { return x.constructor === URL },
encode(url : URL ) : BaseSerializable { return url.href },
decode(href : string ) : URL { return new URL(href) }
}
])
name
is used as the type tag: order does not matter anymorewhen
is still a function: this allows an extension to add support for only a subsection of a type. For example, a symbol extension can only support registered symbols (Symbol.for("xyz")
), leaving well-known symbols (Symbol.toPrimitive
) up for grabs by another extension.encode
returns a type that is natively supported, instead of an ArrayBuffer. The alternative would mean exposing too many implementation details.After having used this API, I believe effectful encoding/decoding needs to be made easier. The use case I'm concerned with is where I need to "register" a virtual ReadableStream
within a websocket.
It is trivial to create effects in the global scope, but to use a websocket connection in an encode/decode function, I had to call createCodec
for each connection such that it captures the websocket in scope and becomes a closure.
Inspired by the official msgpack
javascript package, context for extensions might look like this:
const { encode, decode } =
EsCodec.createCodec<{ socket : WebSocket }>([
{
name: "URL",
when (x : unknown, context) : x is URL { return x.constructor === URL },
encode(url : URL , context) : BaseSerializable { console.log(context.socket); return url.href },
decode(href : string , context) : URL { console.log(context.socket); return new URL(href) }
}
])
const buffer = encode(data, { socket : new WebSocket(url) })
const data = decode(buffer, { socket : new WebSocket(url) })
Sorry I've been gone for a bit. I think the new proposed API looks good. There is still ambiguity on when the "when" clauses are run (are the in-order top to bottom or bottom to top?). I think supporting native types is a fantastic addition that makes it so much easier to use.
I don't quite understand the websocket issue (why do the console logs need to be in the encode/decode?)
There is still ambiguity on when the "when" clauses are run (are the in-order top to bottom or bottom to top?)
@jeff-hykin They are top to bottom. If the 1st extension's when
returns true, 2nd extension's when
is not executed.
In practical terms, this means that more "specific" extensions should be placed near the start of the array you pass to createCodec
.
I don't quite understand the websocket issue (why do the console logs need to be in the encode/decode?)
The console.log serves as an example for how you would affect the outside world.
The MessagePack readme provides a longer example where the context
is something that keeps track of everything being encoded/decoded.
It's for cases where the object you're serializing is not an unchanging static object (ReadableStream, for my use case).
It's needed because it's easy to affect a variable declared at the top-level, but it gets tricky when you need to affect something that's passed in as an argument somewhere (a WebSocket connection, for my use case).
In practical terms, this means that more "specific" extensions should be placed near the start of the array you pass to
createCodec
.
Great, so long as this is mentioned prominently in the docs, I don't see any problem with the API.
Even better, this design makes it easy to define a codec in a standalone file, publish it on Deno.land, and then import it wherever it is needed (e.g. sending someone a serlized file along with a import codec from "somewhere on deno.land"
)
It's for cases where the object you're serializing is not an unchanging static object (ReadableStream, for my use case).
~Hmm I'm still not understanding when it would be desireable to seralize with side effects. If I were to seralize a websocket I'd imagine it something like:~
// sorry I'm simplifying to JS, its a bit easier for pseudocode
const { encode, decode } =
EsCodec.createCodec([
{
name: "Websocket",
when: (x)=> x instanceof WebSocket,
encode: (websocket) => websocket.url,
decode: (websocketUrl) => new WebSocket(websocketUrl)
}
])
~So I don't really know what kind of logic would go into a context.track()
function. For a readable stream I would imagine maybe saving/loading an index, but I don't really understand why there would need be side effects.~
Actually I think I see what you mean. If encode is partially-encoding (e.g. iterably encoding) a value, it would be nice for it to pick up where it leaves off:
const socket = new WebSocket("stuff")
const localFileStream = new ReadableStream("local stuff")
const { encode, decode } = /* ... */
// server.js: normal way
for await (const chunk of localFileStream) {
socket.send(encode(chunk))
}
// client.js: normal way
let chunks = []
while () {
chunks.push(
decode(await socket.recv())
)
}
// ^ kind of having to manually write encode/decode logic
I think there is a lot of value in having a pure-function encoder/decoder, so I feel like a streaming encoder/decoder is a different problem (e.g. seems like a feature beyond es-codec@1.0.0 to me and/or deserves its own stream-encoding API that maybe interally utilizes the pure-function encode/decode).
In terms of the streaming API, hear me out, I think iterables might cover all cases (meaning managing a context could be unnecessary). Iterables are pretty much just a streamUUID & metadata plus individually seralizable chunks (even if chunks themselves are iterables; e.g. needing recursive seralization). Lets say there are two async readable streams stream1
and stream2
. The chunks of stream1 and stream2 won't necessairly be sent in a strict way (e.g. it might be stream1.chunk1
, stream1.chunk2
, stream2.chunk1
, stream1.chunk3
instead of stream1.chunk1
, stream2.chunk1
, stream1.chunk2
, stream2.chunk2
) so the deseralizer will need to handle processing the ID, while making sure all the chunks end up back in the right stream. Variable-ordering alone seems to me like a case for having two different API's (e.g. normal encode should be a deterministic process, while stream-encoding isn't necessairly deterministic)
That encode/decode logic above (encoder labelling chunks with a streamID, then decoder sorting chunks using that streamID) seems very general to me, so having each dev re-implement it with their own custom Context object seems suboptimal. Maybe I'm wrong, especially for "connection" things since they're more than a just streamID + metadata.
I'll need more time to think about this though, I will be facing this problem myself soon enough.
@jeff-hykin This is the use case I made es-codec to enable. In the highlighted line is a "server function" that's going to be called from a browser. https://github.com/lilnasy/astro-server-functions/blob/streams/example/src/serverfunctions.ts#L5
Iterables are pretty much just a streamUUID & metadata plus individually seralizable chunks
Exactly! That is almost verbatim how I've implemented it here (the code hasn't been updated to use context.) https://github.com/lilnasy/astro-server-functions/blob/streams/client-runtime.ts#L28
I think there is a lot of value in having a pure-function encoder/decode
I agree with you. Although, javascript doesn't have a way to enforce this, certainly not from a library. Besides, you can always choose to stick with context-free pure encoding (the URL example still works.)
API
Usage
createCodec
accepts an array of extensions, and returns encode and decode functions that support the corresponding types.