Closed nisanharamati closed 5 years ago
I have a few thoughts ...
Struct
, since Pony already has a struct
. I think this name could lead to confusion (even though I like the fact that it is an homage to the Python package).FormatBuilder
?) that uses methods rather than a string. You used ">Iqd5ss5p"
as an example format string; what I'm suggesting would look something like this: let fmt = FormatBuilder.b_e().l().q().d().s(5).s().p(5).build()
. This has the advantage being checkable at compile-time, and the disadvantage of being more verbose. But you still get the benefit of reusability, because you could then use fmt
like you did in your example.@aturley mentioned on the sync call today that this was inspired by some of the Pony+Python work that you've been doing.
This raises a question for me: Is it a design goal that this be directly compatible with Python, all the way down to the letters chosen in the format string?
If so, I'd suggest that this is too specific a use case to include in the standard library.
If not, I'd suggest changing the API to be a bit more strongly typed. Something like this perhaps?
let byte_seqs = StructPack
.i32(1024)
.f64(3.141592653589793)
.string("hello")
.string("!")
.bytes([1; 2; 3; 4; 5])
@jemc I agree about the specificity concern.
My first stab at this was to pretty much copy the Python module's interface. I think @aturley's idea for a FormatBuilder
is a good improvement, as it adds compile time checking while still allowing for separating the format spec from the actual encoder, and also leaving in place the ability to encode over an array of encodeable objects.
That separation and encode-over-array capability are actually my core motivation for this, because it lets me:
where (2) is basically a macro over the buffered
interfaces. I like how it makes things concise, and also focuses the lower-level conversion code to a single point that's easier to keep tested.
Any improvement over what @jemc is suggesting needs macro support if you're going in the direction of a format builder.
I'd argue that the encode-over-array workflow can still be done here, though in the interest of strong typing it should probably be an array of tuples rather than array of arrays. For example:
let objects: Array[(I32, F64, String, String, Array(U8)] = [
(1024, 3.141592653589793, "hello", ",", [1; 2; 3; 4; 5])
(0, 100.1, "world", "!", [1; 2; 3; 4; 5])
]
let byte_seqs: Array[ByteSeq] = []
for o in objects.values() do
byte_seqs.append(StructPack
.i32(o._1)
.f64(o._2)
.string(o._3)
.string(o._4)
.bytes(o._5)
)
end
Just to further explain why I'm pushing for a strongly typed approach, and why I think it makes an important difference for something we consider for the standard library:
Pony doesn't carry state or information in errors, so the best practice for partial functions is that they can only fail for one specific reason, so that the reason for a error can be statically known by the caller and handled appropriately at the source code of the call site. The currently proposed API doesn't follow this practice, because there are a wide variety of reasons that can cause an error. Some examples of problems that might happen:
If an error is raised from this kind of function, there's no way for the program to do anything but give up, fail loudly, and wait for a human to come along and troubleshoot the situation (by adding log statements, using lldb, etc). That is, it creates a possibility for potentially very costly interventions, so this is something to not adopt lightly, and such an API should usually be used as a last resort.
It's much more in the spirit of the principles of Pony to eliminate the possibility for unknown runtime errors by using a different pattern. This may include returning an ErrorInfo
type that could be interrogated for information about what failed, or finding a way to limit the number of ways a runtime failure could occur to a single specific reason, or even no possibility at all. This may include separating operations so that they fail independently, or as I presented here, making more known to the compiler so we can fail at compile time rather than failing at runtime as much as possible.
@jemc @malthe and @aturley thanks for your feedback! I will try to incorporate this into pony-struct and see how it feels/works.
P.s. if you have any suggestions for a better name to avoid confusion with the builtin struct
, I'd be happy to rename it.
Maybe Streamf
(echoing printf
, but for a stream)?
Form, FieldSet, Entity, Packer, Pack, BinPack: pick two and flamewar until i > N. ;)
It needs a last format field is an array unknown size modifier, i.e., Ruby (un|)pack(1|)
postfix *
-> so say *
prefix, for scientific and data science processing simple formats.
R
and r
for 128-bit types.
Will there be a possibility to un/pack a partial stream and update the Reader/Writer rather than consuming it?
@jemc If one knows the types of the input values already, why not just match
in some generic continuation factory .pack
function rather than repeat oneself with un/boxing manually? Granted, I like having the option of gradual declarative type semantics should the need arise rather than forced into one way or the other: one is slightly more painful to synchronize code in 2+ places, making almost as verbose as decoding it yourself, while the other turns compile-time errors into runtime errors. Meh... either way there's a piper to pay.
I haven't had much time to work on this, and the use case hasn't been particularly pressing. Combined with the caveats around type casting here, I think the best course is to close this RFC and see if this comes up again in the future. Any thoughts or objections?
I have no objections to closing it for now.
This RFC is for adding the ability to use format specification strings to convert between bytestreams and arrays of native Pony types and vice versa. Inspired by Python struct
Rendered
I have a working implementation along with documentation and unit tests which is currently available as the pony-struct package on github.