ponylang / rfcs

RFCs for changes to Pony
https://ponylang.io/
61 stars 48 forks source link

Add Python Struct-like primitive #137

Closed nisanharamati closed 5 years ago

nisanharamati commented 5 years ago

This RFC is for adding the ability to use format specification strings to convert between bytestreams and arrays of native Pony types and vice versa. Inspired by Python struct

Rendered

I have a working implementation along with documentation and unit tests which is currently available as the pony-struct package on github.

aturley commented 5 years ago

I have a few thoughts ...

  1. I like this general idea. However, I'd like to see this spend some time getting used "in the wild" before considering it for inclusion in the standard library.
  2. I'm not a fan of calling this Struct, since Pony already has a struct. I think this name could lead to confusion (even though I like the fact that it is an homage to the Python package).
  3. I'm wondering if a more Pony-esque approach would be to have a builder (FormatBuilder?) that uses methods rather than a string. You used ">Iqd5ss5p" as an example format string; what I'm suggesting would look something like this: let fmt = FormatBuilder.b_e().l().q().d().s(5).s().p(5).build(). This has the advantage being checkable at compile-time, and the disadvantage of being more verbose. But you still get the benefit of reusability, because you could then use fmt like you did in your example.
jemc commented 5 years ago

@aturley mentioned on the sync call today that this was inspired by some of the Pony+Python work that you've been doing.

This raises a question for me: Is it a design goal that this be directly compatible with Python, all the way down to the letters chosen in the format string?

If so, I'd suggest that this is too specific a use case to include in the standard library.

If not, I'd suggest changing the API to be a bit more strongly typed. Something like this perhaps?

let byte_seqs = StructPack
  .i32(1024)
  .f64(3.141592653589793)
  .string("hello")
  .string("!")
  .bytes([1; 2; 3; 4; 5])
nisanharamati commented 5 years ago

@jemc I agree about the specificity concern. My first stab at this was to pretty much copy the Python module's interface. I think @aturley's idea for a FormatBuilder is a good improvement, as it adds compile time checking while still allowing for separating the format spec from the actual encoder, and also leaving in place the ability to encode over an array of encodeable objects.

That separation and encode-over-array capability are actually my core motivation for this, because it lets me:

  1. Reuse a format spec
  2. Implement a one-call encoder using a format spec and a collection of encode-able objects.

where (2) is basically a macro over the buffered interfaces. I like how it makes things concise, and also focuses the lower-level conversion code to a single point that's easier to keep tested.

malthe commented 5 years ago

Any improvement over what @jemc is suggesting needs macro support if you're going in the direction of a format builder.

jemc commented 5 years ago

I'd argue that the encode-over-array workflow can still be done here, though in the interest of strong typing it should probably be an array of tuples rather than array of arrays. For example:

let objects: Array[(I32, F64, String, String, Array(U8)] = [
  (1024, 3.141592653589793, "hello", ",", [1; 2; 3; 4; 5])
  (0, 100.1, "world", "!", [1; 2; 3; 4; 5])
]

let byte_seqs: Array[ByteSeq] = []
for o in objects.values() do
  byte_seqs.append(StructPack
    .i32(o._1)
    .f64(o._2)
    .string(o._3)
    .string(o._4)
    .bytes(o._5)
  )
end
jemc commented 5 years ago

Just to further explain why I'm pushing for a strongly typed approach, and why I think it makes an important difference for something we consider for the standard library:

Pony doesn't carry state or information in errors, so the best practice for partial functions is that they can only fail for one specific reason, so that the reason for a error can be statically known by the caller and handled appropriately at the source code of the call site. The currently proposed API doesn't follow this practice, because there are a wide variety of reasons that can cause an error. Some examples of problems that might happen:

If an error is raised from this kind of function, there's no way for the program to do anything but give up, fail loudly, and wait for a human to come along and troubleshoot the situation (by adding log statements, using lldb, etc). That is, it creates a possibility for potentially very costly interventions, so this is something to not adopt lightly, and such an API should usually be used as a last resort.

It's much more in the spirit of the principles of Pony to eliminate the possibility for unknown runtime errors by using a different pattern. This may include returning an ErrorInfo type that could be interrogated for information about what failed, or finding a way to limit the number of ways a runtime failure could occur to a single specific reason, or even no possibility at all. This may include separating operations so that they fail independently, or as I presented here, making more known to the compiler so we can fail at compile time rather than failing at runtime as much as possible.

nisanharamati commented 5 years ago

@jemc @malthe and @aturley thanks for your feedback! I will try to incorporate this into pony-struct and see how it feels/works.

P.s. if you have any suggestions for a better name to avoid confusion with the builtin struct, I'd be happy to rename it. Maybe Streamf (echoing printf, but for a stream)?

skull-squadron commented 5 years ago
  1. Form, FieldSet, Entity, Packer, Pack, BinPack: pick two and flamewar until i > N. ;)

  2. It needs a last format field is an array unknown size modifier, i.e., Ruby (un|)pack(1|) postfix * -> so say * prefix, for scientific and data science processing simple formats.

  3. R and r for 128-bit types.

  4. Will there be a possibility to un/pack a partial stream and update the Reader/Writer rather than consuming it?

skull-squadron commented 5 years ago

@jemc If one knows the types of the input values already, why not just match in some generic continuation factory .pack function rather than repeat oneself with un/boxing manually? Granted, I like having the option of gradual declarative type semantics should the need arise rather than forced into one way or the other: one is slightly more painful to synchronize code in 2+ places, making almost as verbose as decoding it yourself, while the other turns compile-time errors into runtime errors. Meh... either way there's a piper to pay.

nisanharamati commented 5 years ago

I haven't had much time to work on this, and the use case hasn't been particularly pressing. Combined with the caveats around type casting here, I think the best course is to close this RFC and see if this comes up again in the future. Any thoughts or objections?

aturley commented 5 years ago

I have no objections to closing it for now.