pikelet-lang / pikelet

A friendly little systems language with first-class types. Very WIP! 🚧 🚧 🚧
https://pikelet-lang.github.io/pikelet/
Apache License 2.0
610 stars 26 forks source link

Dependent records #2

Open brendanzab opened 6 years ago

brendanzab commented 6 years ago

One big motivation I've had for going with dependent types in the first place is the idea of being able to use the same language feature (dependent records) for both modules and data types. This is similar to what is proposed in 1ML, but I would prefer not to jump through the hoops of forcing the language into System Fω, and rather just go straight to dependent types, seeing as they are such a handy tool to have in the first place.

These would be super handy for representing modules. Eg.

Graph (a : Type) = Record {
    Node : Type,
    Edge : Type,

    has-edge : a -> Node -> Node -> Bool,
    edges : a -> Node -> List Edge,
}

MyGraph = DArray (Int, Int)

MyGraph-graph : Graph MyGraph
MyGraph-graph = record {
    Node = Int,
    Edge = (Int, Int),

    has-edge = _,
    edges = _,
}

This would also allow us to hide the length of length of arrays to get something like Rust's Vec type:

DArray (a : Type) = Record {
    len : U64,
    data : Array a len,
};

Relationship with Sigma types

A very simple way of representing dependent records would be to use dependent pairs (sigma types):

Unit : Type
{} : Unit
{e1, e2} : (x : t1) * t2
left : (x : t1) * t2 -> t1
right : (x : t1) * t2 -> t2

Where t1 depends on t2.

The trouble with this formulation is that labels are not significant when comparing for equality at the type and value level.

Structural vs Nominal

I would kind of prefer to go structural rather than nominal with this, but it seems that most dependently typed languages use nominal records. I'm guessing this is because it makes checking if two record types are compatible much easier. If we went structural we might need to do some complex subtyping and coercions to handle the fact that fields might be mentioned out of order.

In his thesis on Agda, Ulf Norel says:

In practise, however, it is a good idea to let each record declaration introduce a new type. This means that two record types declared to have the same fields will be different, but they will have the same elements. One advantage of this is that it significantly improves the efficiency of checking equality between record types—instead of comparing the types of all the fields, it is enough to compare the names. It is also good programming practise to keep intentionally different types separate in the type system.

Data layout

It would be nice if records were efficiently packed in memory by default, like Rust does it. The unboxed types issue #18 is related.

A potential issue with a non-uniform memory representation is that, as @ollef points out, if you have a record { x : A, y : B x, z : C }, then the memory offset of z depends on x. We might be able to do some size analysis though, and require non-statically known sizes to have a pointer indirection.

If we allow structural types, data layout might become trickier. We might need to have a consistent 'normalized field order' for records. Things would become even more trickier if row polymorphism was brought into the mix.

Other stuff to consider wrt. data layout is struct of array and array of structs. Perhaps our compile time reflection would be powerful enough to auto-derive this, but langs like Jai have this built in.

This paper talks about some interesting ways to parameterise over data layout: https://www.doc.ic.ac.uk/~scd/ShapesOnwards.pdf

Here is a book going through Data Oriented Design.

Row polymorphism

It would be nice to support row polymorphism, but I have no idea how this would look.

Haven't thought much about it. That paper uses subtyping. For row typing you might be able to do like { Γ; Θ={ y₀:B₀; ⋯ yₙ:Bₙ }; Δ[Γ, Θ] } where Γ and Δ are treated as two separate row variables, Δ depends on Γ and Θ, and Θ is closed with respect to Γ. Not sure though.

First class labels

Labels are also interesting - I've never really liked the 'type level string' nature of record labels, especially once row polymorphism comes into play. Would be nice to have these be namespaced, sort of like how clojure.spec does it. Not sure what the ergonomics would look like though.

Lenses

Would we want to auto-generate these? If we had first class labels and row polymorphism I could see us being generate polymorphic lenses in a library, like how Purescript does it.

References

markbrown commented 6 years ago

Regarding structural vs nominal, I agree with Ulf particularly on the last sentence in the quote. Code reuse is better achieved via a module system than by letting people duplicate types.

brendanzab commented 6 years ago

I guess I'm talking more about being able to easily extend, combine, and alter records, without having to make deeply nested records. And then being polymorphic over 'some record with some fields'. I'll have to try to think of some examples though.

ghost commented 6 years ago

Regarding my twitter comment on row polymorphism, it was really just an offhand remark so I wouldn't take it too seriously. I think row polymorphism in this setting could probably be achieved but I'm not sure how useful (or usable) it would be in practice due to the fact that the ordering of the fields matter in the dependent case.

As for Ulf's comment, I would have probably agreed with that at some point but not so much now, especially if the intention is to also use records in the same way that ML modules are used.

For starters, the issue of efficiency of subtyping should really be treated separately. There are various representational tricks that can make it efficient. Name equality is a kind of degenerate (and overly restrictive) form of that. Secondly, subtyping of this kind is done reasonably efficiently in ML implementations already. OCaml and SML/NJ have pretty fast type checkers compared to many languages, even for module heavy code.

The bigger issue though is that requiring separate declarations for record/module types just kills usability and makes many of the nice constructions you have for a more advanced module calculus (recent OCaml or 1ML) not really possible.

The equivalent thing in OCaml would be requiring that you define a module type Sig every time you want to write a signature for a module:

module M : sig
  ⋮
end = struct
  ⋮
end

would become this:

module type S = sig
  ⋮
end

module M : S = struct
  ⋮
end

Not only is the second version more verbose and requires you to waste a name for S, but it also means you can't use signature expressions inline in useful ways:

module X(T : sig end) = struct
  ⋮
end

(* works *)
module Y(T : sig end) : sig
  include module type of X(T) (* included signature, which isn't named, is reconstructed from X(T) *)
end = struct
  include X(T)
end

module type S = functor (T : sig end) -> sig
  ⋮
end

(* doesn't work *)
module Y(T : sig end) : sig
  include S(T)
end = struct
  include X(T)
end

So this kind of constraint on record types would be devastating as far as usability for modules would be concerned. It's worth keeping in mind that (AFAIK) Ulf didn't really have ML style modules in mind for Agda when he wrote that.

markbrown commented 6 years ago

@brendanzab I like to think of inheritance (i.e., extending, combining, etc) as being orthogonal to typing. For example, if you start with a type A and extend it with an extra field, is that equivalent to some other occurrence of A extended with the same field? That's really just the same question as whether writing out the definition of A twice results in equivalent types in the first place.

I think actually that what you want can be achieved by specifying type constraints such as subtyping or constrained polymorphism abstractly, rather than necessarily via the type structure. This can be suitable for information hiding purposes, where a module says that a type exists but doesn't expose the implementation.

@freebroccolo wrote:

As for Ulf's comment, I would have probably agreed with that at some point but not so much now, especially in the intention is to also use records in the same way that ML modules are used.

I guess I'm still at that first stage. I'll need to spend some time digesting your example a bit more. :-)

For starters, the issue of efficiency of subtyping should really be treated separately. There are various representational tricks that can make it efficient. Name equality is kind of degenerate (and overly restrictive) form of that.

I agree. It's just the information hiding aspect that I'm thinking about.

brendanzab commented 6 years ago

@markbrown

I think actually that what you want can be achieved by specifying type constraints such as subtyping or constrained polymorphism abstractly, rather than necessarily via the type structure.

The trouble with going down the route of subtyping is information loss. Are you familiar with row polymorphism? Information loss is what it tries to solve - you have a parameter that represents 'the rest of the record', and you can use that in your type signature.

markbrown commented 6 years ago

The trouble with going down the route of subtyping is information loss.

I said such as subtyping. What forms of constrained polymorphism you allow is up to you :-)

Are you familiar with row polymorphism?

Yes, along with subtyping it's another instance of constrained polymorphism. In Jones' formulation they are treated uniformly, and don't suffer from information loss.

(As for the problems with subtyping in the first place, you can probably blame the typed OO paradigm.)

ghost commented 6 years ago

I think actually that what you want can be achieved by specifying type constraints such as subtyping or constrained polymorphism abstractly, rather than necessarily via the type structure. This can be suitable for information hiding purposes, where a module says that a type exists but doesn't expose the implementation.

I'm not entirely sure what you mean by this but one potential issue I could see is that if you make this notion of abstraction or field constraints or whatever somehow separate from typing, it's hard to see how you also achieve a unification of records (as values) and modules into a single mechanism.

(How would this non-type structure information be propagated throughout the typing of expressions that might, for example, be the input or output to functions?)

I think the stratification this would lead to is similar to OCaml now in that modules are not values and "module types" (i.e., signatures) are not types. Many people have been trying to push things in the opposite direction though. Recently, OCaml has added a kind of 1st-class module mechanism (via existential packaging of modules into values and unpacking back into modules local to some expression) but it's cumbersome to work with and there are a few constructions that can't easily be expressed this way even though semantically they ought to make sense (and indeed, 1ML allows them).

In any case, here is a sketch of what ML-style modules with row polymorphism might look like:

(* S is treated like a row variable *)
module ExtendWithBar {S} (X : sig
  (* valid input is any module X which has a type called foo *)
  include S
  type foo
end) : sig
  include S (* include everything from module X in the defined module ExtendWithBar(X) *)
  type foo
  (* or just `include module type of X` instead *)
  type bar = list foo
end = struct
  include X
  type bar = list foo
end

module M0 : sig
  type foo
  type bar = list foo
end = ExtendWithBar(struct
  type foo
end)

module M1 : sig
  (* qux is also visible in the output, since it is covered by row variable S *)
  type qux
  type foo
  type bar = list foo
end = ExtendWithBar(struct
  type qux
  type foo
end)
brendanzab commented 6 years ago

Does that last qux field on ExtendWithBar require a value assignment?

brendanzab commented 6 years ago

Just converting the above syntax to records for my own benefit:

Extend-with-bar : 
    {S : Row} -> 
    Record { ..S, foo : Type } -> 
    Record { ..S, foo : Type, bar : Type }
Extend-with-bar = record {
    ..M,
    bar = List foo,
}

M0 : Record { foo : Type = I32, bar : Type } 
M0 = Extend-with-bar (record { foo = I32 })

M1 : Record { qux : Type, foo : Type, bar : Type } 
M1 = Extend-with-bar (record { qux = ???, foo = I32 })

Those records are open to inspection though - I'm guessing ML's sigs are abstract by default, which is why you have the = list foo assignments in the sigs? Would sealing (like in 1ML) be required here to make the internals abstract?

ghost commented 6 years ago

Does that last qux field on ExtendWithBar require a value assignment?

It doesn't require it. You can view it as abstract or empty. In practice you'd want constructors or functions that do something with it.

I think the first part of your example should be changed to bind X (also you'd want to be able to refer to S):

Extend-with-bar : 
    {S : Row} -> 
    (X : Record { ..S, foo : Type }) -> 
    Record { ..S, foo : Type, bar : Type }
Extend-with-bar {S} X = record {
    ..X,
    bar = List foo,
}

Those records are open to inspection though - I'm guessing ML's sigs are abstract by default, which is why you have the = list foo assignments in the sigs? Would sealing (like in 1ML) be required here to make the internals abstract?

That was just for example. I've removed it since it might be confusing. In OCaml, if you leave the signature off, everything will be visible (types will be concrete), unless necessarily constrained by the output signature of the functor to be abstract or hidden.

ghost commented 6 years ago

I think maybe I'm misunderstanding something with your example though. To accurately capture the signature of ExtendWithBar, I think you would need to use singletons:

Extend-with-bar : 
    {S : Row} -> 
    (M : Record { ..S, foo : Type }) -> 
    Record { ..S, foo : Type, bar : (= List foo) }
Extend-with-bar {S} M = record {
    ..M,
    bar = List foo,
}

Maybe this is what you mean with the foo : Type = I32 part for M0:

M0 : Record { foo : Type = I32, bar : Type } 
brendanzab commented 6 years ago

Ah, so you have mentioned singletons in the past - what are they meant to be doing?

ghost commented 6 years ago

Well the idea is that you just have some type (= E) where E is an expression. You could have an explicit constructor for introducing them like (= E) : (= E) or you could make it implicit E : (= E). They are called singletons here because there is only one inhabitant of the type (= E), which is E itself (up to judgemental equality of course).

This allows you to make the definitions of certain record fields observable in the types of the fields, which is something you'll definitely need if you want to use them like modules. In some of the literature on record types they talk about using singletons this way in terms of "manifest fields". 1ML also uses them like this.

brendanzab commented 6 years ago

I guess I'm less familiar with module-style programming vs. type classes or traits.

Would this have a similar effect to Item = i32 in:

fn collect_i32s<I: IntoIterator<Item = i32>>(vals: I) -> Vec<i32> { ... }
ghost commented 6 years ago

The theory would work out a little differently but yeah you should be able to achieve the same end result using singletons for a case like that.

markbrown commented 6 years ago

I'm not entirely sure what you mean by this but one potential issue I could see is that if you make this notion of abstraction or field constraints or whatever somehow separate from typing, it's hard to see how you also achieve a unification of records (as values) and modules into a single mechanism.

Right, it can't be separate. I mean that it's abstract in the sense of still working with names, without necessarily expanding all definitions out as is done with structural equality. The word "abstract" seems to mean too many different things, ironically ;-)

(How would this non-type structure information be propagated throughout the typing of expressions that might, for example, be the input or output to functions?)

That's part of the definition of each constraint. For example, a particular binary constraint may be defined to be transitive.

But the point I was making was just that structural typing isn't strictly necessary for interesting forms of inheritance or composition.

brendanzab commented 6 years ago

I made a gist showing some experimentation on what a category theory library might look like using records. Requires a whole bunch of extra features though - implicit args, instance args, infix operators, etc.

typesanitizer commented 6 years ago

I recently posted a related question on CompSci.SE that you may be interested in tracking: Row polymorphism extended to modules. Unfortunately it hasn't received any answers yet.

brendanzab commented 6 years ago

Ohh, thanks for posting. One thing that would be needed for this is a notion of dependency between the fields of the records. Still not sure how best to do this though. One might also want to do some sort of topological sorting of the records when checking for equality between records. Would be interesting to see how one could express those rules... also might prove a challenge to produce good error messages for them! 😰

It's also unclear as to how this would extend to other row-like things, for example effect rows and variants. For the latter I think it might end up looking like inductive-inductive types, but I might be wrong here.

Not sure if that thelps at all though... If you want to chat more in real time, we have a friendly Gitter chat room - would be nice to see you there! 😄

brendanzab commented 6 years ago

Dependent Intersections look like an interesting way to model records. Apparently Cedille can use them to model inductive types too:

brendanzab commented 5 years ago

This PHD proposal by @alhassy is interesting:

These ideas are definitely quite similar to things I've been thinking of in my head. Like the interplay between what parameters are or are not exposed. Still not sure how things like generativity and notations might be incorporated though - that stuff seems hard!

ratmice commented 4 years ago

On the structural vs nominal debate, I've always been a fan of structural, you can add a unique nominal element to a structural type to achieve the same, unique typing aspects given by nominal typing. But once you go nominal, its very hard to regain the benefits of structural typing.

Modula-3 has a fairly wordy description of the method they took what they called Brands, in How the types got their identity

A more recent account is Brand objects for Nominal typing

I believe the wyvern language takes at least a similar path in an ML-like language a theory of tagged objects

In a dependant typed setting this notion of brands has always seemed to me basically some kind of bracket type, or whatever it is we call those this month :crying_cat_face:

I think really while for nominal types, equality is easier, but I think that really the only benefit, But not very many languages have really gone down this sort of hybrid path.