Replacing the keyword for product types

tothambrus11 commented 5 months ago

Problem:

The current keyword for product type declarations in Hylo is type. This has caused confusion due to its generality, as it also applies to unions, tuples, etc. This generality makes discussing product types cumbersome.

Considerations:

sum and product keywords for enums and structs respectively
- Pros: Conceptually accurate
- Cons: Unfamiliar to many programmers, potential naming conflicts, and overloaded meanings in many domains.
class keyword:
- Pro: familiarity (JavaScript, Java, Kotlin, C#, etc.)
- Cons: Might be overly associated with inheritance-based polymorphism, which is a bit misleading as our protocol-based approach gives the programmer a different set of powers which they need to embrace using completely new design patterns than they were used to.
struct keyword:
- Pros:
- This is what people usually use in natural language to talk about the product type code construct.
- Familiarity for the main target audience (C++, Rust, Swift developers)
- No ambiguity in meaning

Proposal:

Change the keyword for product type declarations from type to struct.

Implications:

The refactoring would affect many files, including tests. I think it would be the best to also refactor all code usages of the term, e.g. ProductTypeDecl into StructDecl, so that contributors can easily find the code constructs by searching for the name of the code construct. This would mean a lot of merge conflicts for all the contributors working on something, so we should schedule this refactoring when all substantial work in progress is merged.

kyouko-taiga commented 5 months ago

I am not at all convinced by your premise.

This generality makes discussing product types cumbersome.

The parser can always disambiguate what type is being used for and I haven't seen any evidence that the syntax is confusing for users. Can you expand on why you think it is cumbersome?

dabrahams commented 5 months ago

According to his original post this came from direct experience attempting to talk about it with people. It's not about what the parser can handle; it's about what humans digest easily. And it seems entirely plausible to me; this isn't just any old type. You can't use the keyword to declare any of the non-nominal types and if there's ever first class support for Enums you won't use it there either.

kyouko-taiga commented 5 months ago

I guess won't object too strongly to a name that has no connotation with a data structure having a known layout, either in terms of memory placement or fields order. type declares an abstraction with an interface that lets you interact with its state. Users of that abstraction are in general not given any additional information. I really care about this distinction.

After some summary research, I feel like none of the keywords in https://en.wikipedia.org/wiki/Composite_data_type convey that idea IMO.

struct strongly relates with data layout as it's common to hear things like "this thing is just a struct in memory"
class strongly relates to reference semantics as mentioned, and
the other candidates relate to structural data types.

I can offer object without a strong conviction. I like the idea that an object is indeed an encapsulation of some data but the keyword suggests that we're declaring just a single object, not the type of different instances. That's what the keyword means in Scala btw.

This very cool language for generic programming uses impl which is interesting if we admit that most types are the implementation of some concept(s) (i.e. traits). There's the risk of confusion with the keyword meaning something different in Rust, though.

Borrowing from OCaml, there's module, which is a little wild but does represent the idea of an interface with some form representation hiding. It also embraces the idea that a type is some existential implementation of a concept.

But I still think type is the most appropriate option. We could prefix it with opaque if we can't live with the status quo:

opaque type Vector2 { ... }

dabrahams commented 5 months ago

I don't think these will be the only opaque (or nominal, for that matter) types if we have newtype, nor, probably, if we have enum.

I don't think impl works because it's easily interpreted as referring to the inside of the opacity boundary, where the opacity is irrelevant.

module is problematic because we don't have a better word than that for what we've been calling modules.

It also embraces the idea that a type is some existential implementation of a concept.

I don't see how. I also don't believe that's a view we want to encourage for all nominal types, even if it's a theoretically legitimate view. Someone came up to me at WWDC and said “I'm doing protocol-oriented programming” and they had defined a protocol for every one of their types.

Normally, introducers tell you what kind of thing you're describing. This one does not, except at the most general level. I think it's a problem that associated types are declared the same way, when they don't have to be the same kind of thing. C++ had a similar problem: template type parameters were introduced with class, and it confused people. Eventually the language allowed typename instead of class to solve the problem.

IMO it's also a problem that we don't have a specific word to talk about these things in English, that distinguishes them from all the other things. And again, the words “opaque” and “nominal” don't seem durable to language evolution.

A good choice would address both those issues.

tothambrus11 commented 5 months ago

I would need to read up on the memory model, and the reasoning behind type opaqueness, I would appreciate if you could attach some resources.

But for me right now, product type declarations in Hylo and Swift are the concrete types. They are something very simple, and I do have the mental model that they are just represented in memory as a fixed number of bytes. Essentially, they are very simple constructs but we can teach them to do many interesting things - that's where I build on abstractions, and make my type conform to certain traits. Then if I want information hiding, I can pass around my object hidden behind a protocol, which will get translated into the concrete type during monomorphisation.

To me, @kyouko-taiga's point seems to suggest that we would like to, by default, capture the idea of opaqueness right in our concrete type declarations, so we would never be able to refer to just a "dumb struct", but always the struct + its abstraction. A nice property that I see here is that when people refer to object.fieldA they don't refer to an actual field directly, the struct could any time change its representation and make fieldA a computed property without breaking existing code. (I'm not sure how this would work between module and ABI boundaries (will we even have ABIs?)) Is there any other relevant idea that I'm missing?

I think what would happen with a lot of people is that they look at the language construct (having whatever keyword), and just realize "Oh, yes, that's just a struct" - and they would translate it mentally without thinking too much about it. Unless we somehow emphasize the difference and its importance, people might just assume they are the same thing, until a bug hits them in the face.

I can see how choosing a different keyword would make it slightly harder for this to happen, as people would be more curious to learn more about the new thing than something they think they are used to (e.g. they might only skim through the article about struct-s).

I would note that systems programmers would definitely appreciate having nice tools for controlling their memory layout of their types precisely, e.g. if they want to interface with certain hardware using memory mapped IO or they want to send network packets of some form, so it might be useful if they could easily annotate their type with some special attribute that allowed to make those rigid guarantees.

Regarding the new keywords:

object is a bit strange to me, even though it has precedence in other languages. It seems to refer to an object instance, instead of a template of it.
module - I agree with Dave, it would be a confusing overload of the term.

Some new alternatives:

mune - a flipped-over enum. I don't know if it has any commonly associated meaning (in Estonian it means eggs). It might be a bit hard to pronounce and talk about, as a mune sounds much like immune.
prod - short for product type, though it might be still awkward to talk about
comp/composite/compound - Since @kyouko-taiga referred to the Composite data type, we indeed care about the composition of multiple sub-parts - not just that there is some structure.

dabrahams commented 5 months ago

“Opaqueness” simply means some details of the type can be private (and are by default, but that's not very important here). Such details include the type's list of stored properties. There is an additional wrinkle of ABI resilience: a module can be compiled such that the sizes of the types it vends (and other private details) are not statically encoded into clients, so all private details in the module can be updated without causing recompilation of clients. But ABI resilience is more of a whole-language feature that doesn't change the essential nature of a product type from the point of view of the programmer.

, @kyouko-taiga's point seems to suggest that we would like to, by default, capture the idea of opaqueness right in our concrete type declarations, so we would never be able to refer to just a "dumb struct", but always the struct + its abstraction.

And I have said that merely capturing opacity in the name is insufficient because it's very unlikely to remain the only opaque kind of type. @kyouko-taiga will have to say for sure but I'm pretty sure "dumb struct" just means struct-as-in-C: no private details, no constructor/destructor semantics, and thus no ability to enforce the maintenance of invariants.

A nice property that I see here is that when people refer to object.fieldA they don't refer to an actual field directly, the struct could any time change its representation and make fieldA a computed property without breaking existing code. (I'm not sure how this would work between module and ABI boundaries (will we even have ABIs?))

That's how Swift works, and we're doing the same thing. To clients across an ABI resilience boundary, all properties are computed, but the computation may reflect some stored property.

IMO “object” connotes object-orientation and thus reference semantic classes. FWIW in Swift class instances are called objects and struct instances are not.

Of all the words you've used above composite or compound come closest to capturing the specific kind of thing this is. I think it would be less awkward to say “Foo is a composite” than to say “Foo is a compound” because a compound (unqualified) commony refers to almost any substance in English. That is fixed by adding the qualification “type” but then a tuple is by any measure a “compound or composite type,” so that could be confusing. So I think composite works better.

Note however that this word doesn't, by itself, connote opacity. So it really depends on what you think it's important to emphasize. We only have one word to spend and I doubt we're going to be able to express “opaque composite” with that. Personally, IMO, composition is the essential aspect because you can always make everything public and non-resilient. Where's the opacity then?

RMJay commented 3 months ago

I must admit that I have always thought that using the keyword type has drawbacks. Mainly that I want to be able to refer to types in conversations with less experienced programmers and not have them think I mean only the things declared with the keyword type.

dabrahams commented 3 months ago

eIndeed, see my earlier remark. Not presuming to speak for @kyouko-taiga but my understanding is that she is unimpressed by these arguments but willing to change the name if a clearly better name than type can be found, and that struct is unacceptable because to her it means what the C language means by struct, and not what C++ or Swift or Rust or Go or C# or Julia mean by that word. So fixing this issue is blocked on coming up with a better word than type (that is not struct).

I don't think more opinions will have any effect here unless they come with winning suggestions for a replacement keyword.

tothambrus11 commented 2 months ago

In case programmer terminology wasn't fun enough: recently I realized how much the food and related items symbolize programming constructs.

torte: the layered/stacked nature of elements inside a product type. The cream between the real, valuable layers is the padding.
taco, wrap, pita: they wrap a bunch of fillings and try to abstract away the fact that they used to be separate things. The abstraction often breaks down and causes a mess.
plate: an optional type, as big as the food it can hold (plus a little margin at the border, where the person interested in the food can grab it).

Error messages would be much friendlier, though potentially less googleable:

example.hylo:30.1-30.9: AssertionFailure: Expected a wrap of type T but function returned an empty plate.
example.hylo:31.1-31.10: AssertionFailure: Expected an empty plate, got wrap of type Eggplant.

Jokes aside, wrap might be actually something I could imagine writing in code.

hylo-lang / hylo