Figure out how to carve up our terms into 'sub-universes' with certain properties

brendanzab commented 5 years ago

Without becoming to wedded to a specific way of doing things, we might also want to do some analysis on the 'kind' of things: Type, Format, Predicate, kind -> kind. This would allow us to express the following types:

// Format arrays
Array : Nat -> Format -> Format
// Host lists
List : Type -> Type

We want to be able to do certain forms of semantic analysis on our terms. We care about properties like:

sizedness: eg. fixed number of bytes, unknown size
alignment: eg. byte aligned, unknown alignment

This should compose nicely with the other ways of doing things in our type system.

Now that I write this, it seems like these properties are both specific to Formats. Perhaps we could track these in there? Ie. having something like Format size align, with some defaults. Just an idea!

mikeday commented 5 years ago

Classifying types into binary / not-binary at the kind level does make it difficult to reuse struct and array as abstract host types, which might be unfortunate.

brendanzab commented 5 years ago

Agreed. We could have kind-specific typing rules for structs. Not sure about arrays though.

brendanzab commented 5 years ago

The other issue is you might want some way of getting a Type from a Format. Perhaps there is some in-built operator that allows you to do that.

brendanzab commented 5 years ago

Perhaps something like Foo::Host:

struct Pixel {
    r : U8,
    g : U8,
    b : U8,
}

struct Header {
    width : U32Be,
    height : U32Be,
}

ImageData (header : Header::Host) : Format = 
    Array (header.width * header.height) Pixel;

struct Image {
    header : Header,
    data : ImageData header,
}

mikeday commented 5 years ago

This may not actually be necessary, it's important to ensure we have a use case for making an explicit distinction for binary types in the language.

brendanzab commented 5 years ago

Could you be a bit more explicit about what 'this' might not be necessary - do you mean this entire issue, or a specific comment here? That would be super helpful! :)

mikeday commented 5 years ago

In this specific example we are getting implicit conversion between Header and Header::Host, suggesting that we could just get implicit conversion of the fields of Header and not need to make any distinction between the host and binary types at all.

For the general issue it would be good to motivate it with uses cases driven by needing an explicit distinction between binary and non-binary types to ensure that any solution is meeting those needs.

brendanzab commented 5 years ago

I was thinking that we'd have typing rules in Format structs that would store header : Header::Host in the context as opposed to header : Header.

mikeday commented 5 years ago

That's an implementation concern and not technically a use case that the user knows about :nerd_face:

brendanzab commented 5 years ago

What would rule out a struct like:

struct Foo {
    first : U32Be,
    second : Int,
}

I mean, it's something that would be impossible to construct, given that U32Be has no constructors. What do I do in the compiler back ends? Skip it? Report a warning? Throw an error?

brendanzab commented 5 years ago

Re. https://github.com/yeslogic/ddl2/issues/48#issuecomment-514895960 could you be elaborate on what you mean by 'it is an implementation concern'? 🤔

brendanzab commented 5 years ago

One way to introduce some polymorphism for fixed-size arrays might be to add a Kind universe above Type/Format/Predicate:

Array : {k : Kind} -> Int -> k -> k

Trouble is that we may need some specific rules around handling alignment and size for format arrays.

brendanzab commented 5 years ago

One idea I was thinking for fixed length host lists you might have something like:

(Array 32 U32Be)::Host = {xs : List Int | length xs == 32}

That means that you might be able to preserve this down to the compilers. I don't know if this addresses your concerns though.

mikeday commented 5 years ago

It's an implementation concern if it relates to the details of how we implement type checking, it's a user-facing use case if it addresses something that a user might want to do with the language.

A struct that mixes binary fields and non-binary fields presumably won't get compiled if it is impossible to actually use it anywhere, it will just be dead code. But it could still be compiled to a Rust struct assuming we can map its binary types to Rust types, it just wouldn't have a parse method.

I don't think we need to worry about typing lists of fixed length if we already have arrays, just using Array 32 U32Be should be sufficient.

brendanzab commented 5 years ago

A struct that mixes binary fields and non-binary fields presumably won't get compiled if it is impossible to actually use it anywhere, it will just be dead code. But it could still be compiled to a Rust struct assuming we can map its binary types to Rust types, it just wouldn't have a parse method.

I guess I was thinking that this does not seem pleasant from a user's perspective, having something randomly not appear in your generated parser. This is just conjecture though.

mikeday commented 5 years ago

If it's not a binary struct then it can't appear anywhere in the overall format, so you would expect to get a warning for unused code at most.

brendanzab commented 5 years ago

So for clarity, the main objection to doing type-level reasoning about format and host types is that:

It potentially complicates the language, for example by making it harder to be generic across kinds of types. The question is whether it adds anything useful to the language.

This is definitely an important point!

brendanzab commented 5 years ago

My mind is also probably polluted by how The Power of Pi represented this stuff:

sashaboyd commented 5 years ago

It potentially complicates the language, for example by making it harder to be generic across kinds of types. The question is whether it adds anything useful to the language.

As a vague thought on this, I'm reminded of one of the original ideas behind Hungarian notation, namely to indicate the intention of a variable, beyond just its type. So you might distinguish between an unsanitized string and a sanitized string, for example, to make sure you don't introduce security vulnerabilities by using unsanitized input accidentally.

It seems like in this situation we're trying to distinguish between cases where we care more about the denotational characteristics of a value (does this value have a particular mathematical meaning) versus the operational characteristics of a value (does this value have a particular implementation). If you can distinguish between those two things explicitly, you can potentially swap out different implementations when doing operations where you only care about the meaning, or conversely have different mathematical interpretations of the same implementation.

Sorry, this is probably not very useful.

brendanzab commented 5 years ago

The denotational/operational divide is quite interesting in the case of the DDL, since we really care that if we compile to a validator, dump tool, or query style of parser, that we get the same 'meaning' extracted from the binary data. Hmmm.

mikeday commented 5 years ago

Here is a quick sketch of a predicates-on-types approach:

pred binary(type :: in) is semidet.

binary(u8).
binary(u16be).
binary(u32be).
binary(struct(Fields)) :- all [X] ( member(X, Fields) => binary(X) ).
binary(array(_Len, Type)) :- binary(Type), sized(Type, _Size).

pred sized(type :: in, size :: out) is semidet.

sized(u8, 1).
sized(u16be, 2).
sized(u32be, 4).
sized(struct(Fields), Size) :- Size = sum(map(sized, Fields)).
sized(array(Len, Type), Size) :- sized(Type, ItemSize), Size = Len * ItemSize.

A similar approach could be used to describe alignment and other constraints, which can be thought of as predicates on types or type class constraints or traits as appropriate.

brendanzab commented 5 years ago

How do you handle functions?

mikeday commented 5 years ago

What sort of functions? Function types are neither binary nor sized; you could apply the predicates to function arguments if that is necessary.

brendanzab commented 5 years ago

How do you know the sizeness or binaryness of the body of the function without inspecting its contents?

brendanzab commented 5 years ago

Or do you figure that out every time you apply a function - kind of like how templates work in C++?

mikeday commented 5 years ago

If a function returns a type then you can inspect the type that it returns,

make_pair(A, B) = struct { fst: A, snd: B }.

binary(make_pair(u8, u8)).
~binary(make_pair(int, int)).

I don't think it's necessary to analyse functions you haven't called yet, although in theory you could, you would just end up with a potentially very complex expression that duplicated the function body.

binary(make_pair(X, Y)) :- binary(X), binary(Y).

(Consider if make_pair contained a complex condition for example).

brendanzab commented 5 years ago

Heh, I find the pretty distasteful to be honest! 😅

mikeday commented 5 years ago

What is the use case, to have functions that make guarantees about the types they return? That could be done by applying predicates to the type signature, similar to type class constraints in Mercury or trait bounds in Rust:

func make_pair(X, Y) = Z <= binary(X), binary(Y) => binary(Z).

I'm not sure if this would actually be necessary, but if necessary it's just pre/post-conditions as predicates.

mikeday commented 5 years ago

Here is a quick sketch in Coq:

Inductive TypeDesc : Set -> Type :=
 | Host : forall (v: Set), v -> TypeDesc v
 | Unit : TypeDesc unit
 | U8 : TypeDesc nat
 | Seq : forall v1 v2 (t1: TypeDesc v1), (v1 -> TypeDesc v2) -> TypeDesc (v1 * v2)
 | Arr : forall v, nat -> TypeDesc v -> TypeDesc (list v)
 | Comp : forall (v1 v2: Set), TypeDesc v1 -> (v1 -> v2) -> TypeDesc v2.

Inductive Binary : forall v, TypeDesc v -> Prop :=
 | Binary_Unit : Binary _ Unit
 | Binary_U8 : Binary _ U8
 | Binary_Seq : forall v1 v2 t1 f, Binary v1 t1 -> forall v, Binary v2 (f v) -> Binary _ (Seq v1 v2 t1 f)
 | Binary_Arr : forall v n t, Binary v t -> Binary _ (Arr v n t)
 | Binary_Comp : forall v1 v2 t1 f, Binary v1 t1 -> Binary _ (Comp v1 v2 t1 f).

It doesn't include sizes but this could be added to the Binary predicate, just gets a little messy.

An interesting part is the way sequencing works: since it's a pair (A:Type, \a:A, B(a):Type) you have to prove Binary(A) and also Binary(\a:A, B(a)), which does make sense if you think about it.

mikeday commented 5 years ago

Here is the BinarySize predicate, which extends Binary with an optional size for types that have a fixed size and can thus be used as array element types:

Inductive TypeDesc : Set -> Type :=
 | Host : forall (v: Set), v -> TypeDesc v
 | Unit : TypeDesc unit
 | U8 : TypeDesc nat
 | Seq : forall v1 v2 (t1: TypeDesc v1), (v1 -> TypeDesc v2) -> TypeDesc (v1 * v2)
 | Arr : forall v, nat -> TypeDesc v -> TypeDesc (list v)
 | Comp : forall (v1 v2: Set), TypeDesc v1 -> (v1 -> v2) -> TypeDesc v2.

Inductive Binary : forall v, TypeDesc v -> Prop :=
 | Binary_Unit : Binary _ Unit
 | Binary_U8 : Binary _ U8
 | Binary_Seq : forall v1 v2 t1 f, Binary v1 t1 -> forall v, Binary v2 (f v) -> Binary _ (Seq v1 v2 t1 f)
 | Binary_Arr : forall v n t, Binary v t -> Binary _ (Arr v n t)
 | Binary_Comp : forall v1 v2 t1 f, Binary v1 t1 -> Binary _ (Comp v1 v2 t1 f).

Inductive BinarySize : forall v, TypeDesc v -> option nat -> Prop :=
 | BinarySize_Unit : BinarySize _ Unit (Some 0)
 | BinarySize_U8 : BinarySize _ U8 (Some 1)
 | BinarySize_Seq_Fixed : forall v1 v2 t1 f s1 s2, BinarySize v1 t1 (Some s1) -> forall v, BinarySize v2 (f v) (Some s2) -> BinarySize _ (Seq v1 v2 t1 f) (Some (s1 + s2))
 | BinarySize_Seq_Varying : forall v1 v2 t1 f os1, BinarySize v1 t1 os1 -> forall v, (exists os2, BinarySize v2 (f v) os2) -> BinarySize _ (Seq v1 v2 t1 f) None
 | BinarySize_Arr : forall v n t s, BinarySize v t (Some s) -> BinarySize _ (Arr v n t) (Some (n * s))
 | BinarySize_Comp : forall v1 v2 t1 f os, BinarySize v1 t1 os -> BinarySize _ (Comp v1 v2 t1 f) os.

Definition pixel := Seq _ _ U8 (fun _ => Seq _ _ U8 (fun _ => U8)).

Definition fixed_string := Arr _ 256 U8.

Definition pascal_string := Seq _ _ U8 (fun len => Arr _ len U8).

Lemma pixel_is_three_bytes : BinarySize _ pixel (Some 3).
replace 3 with (1 + (1 + 1)).
apply BinarySize_Seq_Fixed.
apply BinarySize_U8.
exact 0.
apply BinarySize_Seq_Fixed.
apply BinarySize_U8.
exact 0.
apply BinarySize_U8.
auto.
Qed.

Lemma fixed_string_is_256_bytes : BinarySize _ fixed_string (Some 256).
replace 256 with (256 * 1).
apply BinarySize_Arr.
constructor.
auto.
Qed.

Lemma pascal_string_is_not_fixed_size : forall s, ~BinarySize _ pascal_string (Some s).
...

That last lemma is getting a bit beyond my proof ability, and could probably be simplified by adjusting the definition of BinarySize for sequences, which needs to handle four different combinations (sized/unsized for two arguments).

yeslogic / fathom

Figure out how to carve up our terms into 'sub-universes' with certain properties #161