keean / zenscript

A trait based language that compiles to JavaScript
MIT License
42 stars 7 forks source link

Fast track transpiling to TypeScript? #13

Open shelby3 opened 8 years ago

shelby3 commented 8 years ago

Building off the reasoning and justification in my self-rejected issue Fast track transpiling to PureScript?, I have a new idea of how we might be able to attain some of the main features proposed for ZenScript, without builiding a type checker by transpiling to TypeScript.

If we can for the time being while using this hack, presume that no modules will employ differing implementations of a specific data type for any specific typeclass, i.e. that all implementations of each data type are the same globally for each typeclass implemented (which we can check at run-time and throw an exception otherwise), then the module can at load/import insure that all implementations it employs are set on the prototype chain of all the respective classes' construction functions. In other words, my original point was that JavaScript has global interface injection (a form of monkey patching) via the prototype chain of the construction function, and @svieira pointed out the potential for global naming (implementation) conflicts.

So the rest of the hack I have in mind, is that in the emitted TypeScript we declare typeclasses as interfaces and in each module we declare the implemented data types as classes with all the implemented interfaces in the hierarchy. So these classes then have the proper type where ever they are stated nominally in the module. We compile the modules separately in TypeScript, thus each module can have differing declarations of the same class (because there is no type checking linker), so that every module will type check independently and the global prototype chain is assured to contain the interfaces that the TypeScript type system checks.

So each function argument that has a typeclass bound in our syntax, will have the corresponding interface type in the emitted TypeScript code. Ditto typeclass objects will simply be an interface type.

This appears to be a clever way of hacking through the type system to get the type checking we want along with the ability to have modules add implementations to existing data types with our typeclass syntax. And this hack requires no type checking in ZenScript. We need only a simple transformation for emitting from the AST.

As for the first-class inferred structural unions, TypeScript already has them, so no type checking we need to do.

It can't support my complex solution to the Expression Problem, but that is okay for starting point hack.

I think this is way we can have a working language in a matter of weeks, if we can agree?

That will give us valuable experimentation feedback, while we can work on our own type checker and fully functional compiler.

TypeScript's bivariance unsoundness should be avoided, since ZenScript semantics is to not allow implicit subsumption of typeclass bounds, but this won't be checked so it is possible that bivariance unsoundness could creep in if we allow typeclasses to extend other typeclasses. Seems those bivariance cases don't impact my hack negatively though:

When comparing the types of function parameters, assignment succeeds if either the source parameter is assignable to the target parameter, or vice versa.

Not a problem because of course an interface argument type can never to be assigned to any subclass.

When comparing functions for compatibility, optional and required parameters are interchangeable. Extra optional parameters of the source type are not an error, and optional parameters of the target type without corresponding parameters in the target type are not an error.

When a function has a rest parameter, it is treated as if it were an infinite series of optional parameters.

We should be able to design around this.

Also a compile flag can turn off TypeScript's unsound treatment of the any type.

TypeScript is structural but there is a hack to force it to emulate nominal typing. We could consider instead transpiling to N4JS which has nominal typing and soundness, but it is not as mature in other areas.

keean commented 8 years ago

So for zero argument functions, if we want to allow the shortform of calling them length instead of length(), then we also need some way of differentiating when we don't want to call them but want the function instance instead. So the solution is to maintain our consistent rule of partial application and specify that the non-existent first parameter is not passed length(_).

But this just creates a mess obscuring what is a value, what is a function reference. If you have a function reference you need to execute it before you can get the result. We execute a function by providing its arguments.

Its just so inconsistent, if I pass a one argument function as an argument to a higher-order function I do not put any brackets on it, just he plain variable can be passed, but for a zero argument function I have to put (_) after it just to pass it into a higher-order function.

shelby3 commented 8 years ago

@keean wrote:

But this just creates a mess obscuring what is a value, what is a function reference. If you have a function reference you need to execute it before you can get the result. We execute a function by providing its arguments.

I don't agree there is a necessary distinction between the two at the denotational semantics layer. You are down in the operational semantics layer. Apologies if I am misusing the academic terms. Hope you understand that I don't see how it makes any difference for our code whether something is called or not at some lower layer of semantics. If you can point out an example where it will make a difference, then please do.

Edit: note I see @keean has pointed out that values can't have side-effects.

Its just so inconsistent, if I pass a one argument function as an argument to a higher-order function I do not put any brackets on it

But is also sort of an inconsistency that “partial application” of no-arguments to a zero and one argument function would be same syntax otherwise. Perhaps pure zero argument functions should never be functions, and instead always be passed as immutable values. Or stated another way, pure zero argument functions should not be allowed.

but for a zero argument function I have to put (_) after it just to pass it into a higher-order function.

The inconsistency I see is that we've allowed partial application on an argument that does not exist, but in return we get this beauty of less proliferation of () all over the place. And we can unify immutables and pure zero argument functions.

But for non-pure zero argument functions, I think I agree with you.

@shelby3 wrote:

You are trying to find The One Pattern™ but it doesn't exist. I went down that rabbit hole with my various mockups for my stillborn Copute language project.

Sorry there isn't any one perfectly consistent universal pattern. Perfection is impossible. It is always about balancing tradeoffs.

shelby3 commented 8 years ago

I found a simpler implementation of a data type in TypeScript which typechecks better ([Playground](http://www.typescriptlang.org/play/index.html#src=class%20_Nil%20%7B%0D%0A%20%20%20%20private%20_nominal%3Aundefined%0D%0A%7D%0D%0Aconst%20Nil%20%3D%20new%20_Nil()%0D%0Aclass%20Cons%3CA%3E%20%7B%0D%0A%20%20%20%20private%20_nominal%3Aundefined%0D%0A%20%20%20%20constructor(readonly%20head%3A%20A%2C%20readonly%20tail%3A%20List%3CA%3E)%20%7B%20%7D%0D%0A%7D%0D%0Atype%20List%3CA%3E%20%3D%20_Nil%20%7C%20Cons%3CA%3E%0D%0A%0D%0Alet%20x%20%3D%20new%20Cons%3Cnumber%7Cboolean%3E(0%2C%20new%20Cons(true%2C%20Nil)))):

class _Nil {
    private _nominal:undefined
}
const Nil = new _Nil()
class Cons<A> {
    private _nominal:undefined
    constructor(readonly head: A, readonly tail: List<A>) { }
}
type List<A> = _Nil | Cons<A>

let x = new Cons<number|boolean>(0, new Cons(true, Nil))

The private member forces a simulated nominal typing.

shelby3 commented 8 years ago

Also this compiles in TypeScript:

const y = ((x: boolean): number | boolean => {
    if (x) return x
    else return 1
})(true)

Note the following generates an error "No best common type exists among return expressions", but TypeScript completes the compilation anyway and assigns the correct type number | boolean (actually number | true which seems to unify with number | boolean) to y:

const y = ((x: boolean) => {
    if (x) return x
    else return 1
})(true)
shelby3 commented 8 years ago

Refresh both prior comments. Where I am headed with this, is to create some regular expression transformations so I can simulate a transpiler for the basic Boost syntax I want to TypeScript (no block indenting much closer to JavaScript syntax), as an interim methodology so I can code some aspects in my preferred style immediately.

In other words, for the prior two comments I will be able to write instead respectively:

data List<A> <- Nil | Cons(A, List<A>)

var y := if (x) x else 1

Note if I instead write:

val y := if (x) x else 1

This would be converted to TypeScript:

const y = Object.freeze(((x) => {
    if (x) return x
    else return 1
})(true))

Or if I instead write:

var y = if (x) x else 1

This would be converted to TypeScript:

let y = ((x) => {
    if (x) return x
    else return 1
})(true)

Note it is apparently the very recent 2.0 compiler upgrade for TypeScript that made the above possible where I can do a syntactical transpilation with absolutely no AST.

@keean I told you before that I needed to find shortcuts. You are trying to go directly to producing a compiler with unification, which I estimated would take many months and I am correct. I am happy we discussed all the design issues, but now I need to return to my original priority set.

Edit: my alternative idea for syntax is:

data Nil | Cons(A, List<A>) -> List<A>

Which I think I like better. Makes it more clear.

Note afaics constructors can not pure functions because they create a new object every time they are called so we don't need to write:

data Nil | Cons(A, List<A>) :-> List<A>
shelby3 commented 8 years ago

@shelby3 wrote in #11:

@shelby3 wrote:

I would like to have the perfect language now, but unfortunately priorities beckon

It is notable how none of the existing languages can satisfy my interim needs well.

I took another look at haXe last night and immediately realized it is not even close to what I want, even with the macros facility. For example, it offers no support of unions and I can't even do some of things I want to do that I can do in ECMAScript such as async procedures pattern employing generators.

I contemplated Scala because at least I can do rudimentary typeclasses employing implicits, but again no unions, would need to use Akka for actor model to get promises (and presumably async procedures) but without compatibility with JavaScript ecosystem (and debugging in the browser being a dissonant). Also I would have to use the heavyweight IDE compilation ecosystem of Java, e.g. Eclipse. Ughh.

I thought for a moment about using C++ and Emscripten for transpilation to JavaScript, then slapped myself.

So then the only sane decision was to try to add some basic improvements to JavaScript with the least pain.

By transpiling to TypeScript, I get unions, guards (I can even perhaps syntactically transpile a match), and very good interoption with ECMAScript, but no typeclass although I can similate them to a limited extent with interfaces and some manual injection of the dictionary (either a function which wraps delegation or another trick of having the interface input an argument for self instead of using this).

shelby3 commented 8 years ago

P.S. please read my private message on LinkedIn, because I think you misunderstood what I meant when I created this issue thread. I didn't mean a half-baked partial language that we can experiment with its limited features such as what you have now in Compiler Implementation. I meant I need a complete language immediately, so I want shortcuts of transpiling to TypeScript interim so we can get some benefits immediately for production code.

keean commented 8 years ago

@shelby3 I would recommend looking at something like sweet.js to see if it can do what you need.

shelby3 commented 8 years ago

@shelby3 I would recommend looking at something like sweet.js to see if it can do what you need.

Thanks for the suggestion. I believe I had seen that before. I analyzed it again, and unfortunately it doesn't support TypeScript types, and the support for non-JavaScript symbols (e.g. ->) appears to not be ready yet.

I think it will also be impossible to achieve with regular expressions, because I need recursion of productions.

So think the only way forward would be to complete the grammar for our language, then write a parser. Use the parser to output the equivalent TypeScript, while avoiding using any features that would require unification and type checking. There doesn't seem to be any shorter cut than that.

shelby3 commented 8 years ago

@shelby3 wrote:

So think the only way forward would be to complete the grammar for our language, then write a parser. Use the parser to output the equivalent TypeScript, while avoiding using any features that would require unification and type checking. There doesn't seem to be any shorter cut than that.

Been sidetracked past week on other work. Am still going to attempt to come back to this and attempt the final grammar parser (lest I forget everything I needed to change in the grammar if I stay away from this too long).

I have not abandoned the goals we set. Just trying to figure out how to fit into my priorities.

I am not eager to code directly in TypeScript. I'd rather have a transpiler I can tweak even without the typing to start with.

shelby3 commented 7 years ago

@shelby3 wrote:

I found a simpler implementation of a data type in TypeScript which typechecks better ([Playground](http://www.typescriptlang.org/play/index.html#src=class%20_Nil%20%7B%0D%0A%20%20%20%20private%20_nominal%3Aundefined%0D%0A%7D%0D%0Aconst%20Nil%20%3D%20new%20_Nil()%0D%0Aclass%20Cons%3CA%3E%20%7B%0D%0A%20%20%20%20private%20_nominal%3Aundefined%0D%0A%20%20%20%20constructor(readonly%20head%3A%20A%2C%20readonly%20tail%3A%20List%3CA%3E)%20%7B%20%7D%0D%0A%7D%0D%0Atype%20List%3CA%3E%20%3D%20_Nil%20%7C%20Cons%3CA%3E%0D%0A%0D%0Alet%20x%20%3D%20new%20Cons%3Cnumber%7Cboolean%3E(0%2C%20new%20Cons(true%2C%20Nil)))):

class _Nil {
    private _nominal:undefined
}
const Nil = new _Nil()
class Cons<A> {
    private _nominal:undefined
    constructor(readonly head: A, readonly tail: List<A>) { }
}
type List<A> = _Nil | Cons<A>

let x = new Cons<number|boolean>(0, new Cons(true, Nil))

The private member forces a simulated nominal typing.

The primary distinctive attribute of a type alias is that it is not a first-class nominal type, i.e. it is just name that is replaced syntactically and thus never appears in the AST.

I think I prefer the simplification that sum (aka union) and product (aka intersection, but not the same as record) types should never be first-class nominal types, because otherwise (compositionality is limited, i.e.) these nominal unions and intersections can not share their constituent member types. Here is an example of why it creates ambiguity otherwise:

data Thing = A | B
data Thing2 = B | C
foo(arg: Thing | Thing2) …

Now if you pass an input of type B to function foo then what is the type of the arg? Is it a Thing or Thing2? If we argue that the type is just B (and thus that the type of arg is really just A | B | C), then there is no reason to have Thing and Thing2 be nominal types.

One ramification of this preference is that our syntax for data can become much simpler. We no longer need to duplicate the type name of the left and right-hand-side of an = because we not longer need = in the syntax, e.g.:

data MyData(a: A, b: B, c: C)

Bear in mind the point I made recently about constructor functions for data. Also my point to not ~“unify”~ conflate everything as you had noted was conceptually plausible.

(P.S. I posted this in this thread because you may have noted recently that I am really trying to align my design decisions to be compatible to transpilation to TypeScript, but only where it is what I would want anyway long-term for the language.)

keean commented 7 years ago

Or you can see it the other way, as the most common constructor function is to populate the data directly, then it is shorter to write

data Either[A, B] = Left(A) | Right(B)

Which nicely declares the parametric type, and both the constructor functions in one line.

I agree about 'type aliases' they are just a shorthand way of writing the full type and have no meaning other than that. They can get expanded out to the underlying types.

shelby3 commented 7 years ago

@keean that only works unambiguously because you never allow Left to be a constructor for a different data type. This means you can not reuse and have general compositionality of constructors without creating type-system ambiguity.

It means you can not normalize (flatten) unions and intersections. For example anonymous unions and intersections, as well Either can not be recursive without wrapping it in a Left or Right which becomes boilerplate hell.

I really do not like the wrapping of FCP (which we ended up determining was basically unnecessary except for global inference) nor the wrapping proposed here. But if you can show me some compelling use case, I am open-minded.

I am positing I prefer structural typing of unions and intersections and not nominal. However, I might not be aware of all the ramifications?

keean commented 7 years ago

This means you can not reuse and have general compositionality of constructors without creating type-system ambiguity.

Of course you can, datatypes are not structural, you are getting confused with the use of | in a datatype to mean disjoint-union and an anonymous union.

There is no FCP going on here, these data declarations are straightforward encodings of structs and enums, with their constructors. The main difference is that these enums can have properties, unlike the C/C++ ones that are simply integers.

This leads to very compact representations like:

data BinaryTree[A] = Branch(BinaryTree[A], BinaryTree[A]) | Leaf(A)

data NaryTree[A] = Branch(List[BinaryTree[A]]) | Leaf(A)

which effectively encode subtyping hierarchies (without the subtyping) like:

class BinaryTree[A];

class Branch[A] : BinaryTree[A] {
   Left : BinaryTree[A]
   Right : BinaryTree[A]
}

class Leaf[A] : BinaryTree[A] {
   Value : A
}

Note its not quite the same because Branch[A] and Leaf[A] are separate types in the class encoding and it needs subtyping, whereas in the data encoding, all the values have type BinaryTree[A] so maybe its a bit more like:

class BinaryTree[A] {
   enum {
      Branch,
      Leaf
   }
   union {
      Branch : struct {
         Left : BinaryTree[A]
         Right : BinaryTree[A]
      },
      Leaf : A
   }
}

However in this case there is nothing that really ties the enum options to the union alternatives. Really the object model doesn't quite have something that matches the semantics, where we get an enum to mark Branch or Leaf, and the 'union' data has to match the enum tag.

shelby3 commented 7 years ago

There is no FCP going on here

Where did I write that FCP was going on here? I mentioned FCP because of its wrapping boilerplate which it shares in common with Either. You even admitted that FCP requires wrapping and unwrapping boilerplate.

I will quote myself for you:

I really do not like the wrapping of FCP (which we ended up determining was basically unnecessary except for global inference) nor the wrapping proposed here.

shelby3 commented 7 years ago

@keean did you not notice that I linked to the Tree example, so why are you repeating an example I already linked to?

I will quote myself for you:

I think I prefer the simplification that sum (aka union)

shelby3 commented 7 years ago

@keean wrote:

Of course you can, datatypes are not structural, you are getting confused with the use of | in a datatype to mean disjoint-union and an anonymous union.

I am not confused. And you have not written anything which addresses nor refutes what I wrote.

You are stating that you prefer wrapping and nominal types for unions, but you have not responded to my criticisms.

We are trying to be respectful to each other right. The term confused is very borderline close to ad hominem.

keean commented 7 years ago

I think I prefer the simplification that sum (aka union) …

That page describes exactly what I have been talking about, so I don't know how you 'prefer the simplification'?

I think you misunderstand and assume 'Cons' is a type when it is in fact a data-constructor as I have explained. This is important:

A sum type is not a union.

A sum type is a 'dijoint union' or a 'tagged union' (all the same thing, different names). This is different from a plain 'union' type which lacks the tag.

We can look at the memory. Lets look at a List, with Cons and Nil. Lets encode Cons as 1, and Nil as 0. Lets have an integer list. The encoding of [1,2,3] could be something like (bytes in memory):

1, 1, 1, 2, 1, 3, 0

If Cons and Nil were types, then you would expect to see the following in memory:

1, 2, 3

Note you cannot see the Nil as it is a type, so the type system knows the length of the list but there are no runtime markers in memory.

Another example might be a list List[Either[Int, String]], where we encode Left as 0 and Right as 1, we would then encode [1, 'a', 2, 'b'] as

0, 1, 1, 'a', 0, 1, 1, 'b'

However a list of a union like List[Int \/ String] would look like this in memory:

1, 'a', 2, 'b'

So there is no tag to allow the runtime to distinguish between the types.

I think part of the problem is the 'boxing' confuses things, because in dynamically typed languages that have boxed types we can have a list of a union of boxed type like this:

[Int 1], [String 'a'], [Int 2], [String 'b']

So with boxed types you don't really need the tag, because each type has its own runtime tag in the box.

We might say that Sum types allow pervasive unboxing, and introducing unions types requires boxing... I think our misunderstanding comes from me thinking about unboxed types by default, and you thinking about boxed types by default.

Note: I am not saying I don't want boxed types, just that I would mark them explicitly in the type system, so that you can have both boxed and unboxed types in the same type system.

shelby3 commented 7 years ago

A sum type is not a union.

A sum type is a disjoint (tagged) union.

Given the types in an anonymous union will also be tagged, I see no distinction between them other than whether the type is first-class nominal or first-class structural.

And I explained the trade-offs of the nominal approach (wrapping, lack of ability to compose constructors in unions arbitrarily).

I think you misunderstand and assume Cons is a type when it is in fact a data-constructor as I have explained.

I do not know why you think that about me or what I wrote. I clearly used the word ‘constructor’ and referred to data.

shelby3 commented 7 years ago

refresh

I noticed after I wrote my response above, that you added the following to your prior post:

A sum type is a 'dijoint union' or a 'tagged union' (all the same thing, different names). This is different from a plain 'union' type which lacks the tag.

keean commented 7 years ago

I noticed after I wrote my response above, that you added the following to your prior post:

I added it before reading your post.

And I explained the trade-offs of the nominal approach (wrapping, lack of ability to compose constructors in unions arbitrarily).

But your approach requires all types to be boxed, which uses a lot more memory and is a lot slower.

My approach is to have | for disjoint union and \/ for union, where the latter requires boxed type arguments, and the former unboxed.

keean commented 7 years ago

refresh

shelby3 commented 7 years ago

But your approach requires all types to be boxed, which uses a lot more memory and is a lot slower.

Okay that is now a salient response and good point. I will think about it while I eat.

But not all types, only union types, and not boxed but tagged (which is an orthogonal concept).

Please try to avoid implications that I am dumb. Neither of us like it when the other does that, even accidentally. I understand we both get perplexed at times what is the significance of what each other means. So I do understand.

I will quote myself again:

However, I might not be aware of all the ramifications?

keean commented 7 years ago

Please try to avoid implications that I am dumb.

I don't think I have implied you are dumb. That has certainly never been my intention. Misunderstanding does not imply dumb. It implies that either preconceptions are preventing the person from listening/understanding, or my communication was not clear enough.

shelby3 commented 7 years ago

Meta: There is no need to state that a person is confused (if they insist and are clearly trolling then perhaps just put on Ignore if they persist too much). If we provide our reasoning, then each other can decide on their own accord if they were confused because a reasoning is correct. Presuming that the other person is something or another implies judgement of the other person. We should just explain our technical thoughts. Note I have committed personal judgement too many times, but I am trying to reform myself. As I wrote, it was only borderline and possibly accidental, sometimes is a quick way of expressing ourselves when we could find better words if we slow down. I think both of us are often rushing. And we both know not to force our communication when we are sleepy or famished. We should not rush. Apologies for calling you out on this as that is also a form of judging (I should have brought the Meta to private communication, my mistake). Ah Internet communications. Hopefully we will coalesce well now.

shelby3 commented 7 years ago

However a list of a union like List[Int \/ String] would look like this in memory:

1, 'a', 2, 'b'

So there is no tag to allow the runtime to distinguish between the types.

You know that a string is not typically placed unboxed in memory like that, because it has a variable length.

Also we had discussed in the past (it was your point afair) that boxing is orthogonal to tagging, i.e. the presence or not of boxing, does not dictate the presence of tagging or not nor vice versa. Perhaps unboxing typically coincides with untagged in many languages?

So I presume you are just making a conceptual point that a union can be untagged but a Sum (aka tagged union) type is never untagged?

We might say that Sum types allow pervasive unboxing, and introducing unions types requires boxing...

A sum type is always tagged, but I do not see how that implies it is always unboxed. Appears the two are orthogonal concepts.

Static (compile-time) union types do not absolutely require boxing because the types in the union are bounded statically (whereas typeclass objects do because they are dynamically polymorphic and unbounded). Whether they require tagging depends on whether the language wants them to be reified or not, which afaics impacts whether the instances of the union type can be type-case logic filtered. I really do not see much utility for untagged unions, do you?

Again I do not see any boxing nor tagging distinction between structural versus nominal unions. I just see the nominal approach has disadvantages that I first enumerated, which afaics I have not yet seen a refutation of. You know that List never appears in memory in the language you want (i.e. it is not reified nor a supertype/superclass), only the data constructors (e.g. Cons and Nil) are tagged. Thus the nominal aspect has really nothing to do with the tagging.

Edit: to be a bit more explicit, let us note for readers that Left and Right wrap the actual types we want in the union. If unwrapped, then we do not need all that boilerplate and can collapse nested Either into a flat optimal tagged union of the unwrapped types. That was my original thinking on this matter which causes me to favor structural and not nominal.

Dessix commented 7 years ago

What if instanceof could be used when translating to a JS backend to allow access to the "tag" already held by JS, assuming the types involved are able to be differentiated using some combination of these calls? Since ZenScript has typeclasses, would such artificial tagging be able to be bolstered by a typeclass for differentiating otherwise undifferentiable user types?

shelby3 commented 7 years ago

What if instanceof could be used when translating to a JS backend

That is what I am intending to employ, by transpiling to TypeScript which has instanceof guards (to achieve JS).

keean commented 7 years ago

A sum type is always tagged, but I do not see how that implies it is always unboxed. Appears the two are orthogonal concepts.

Sum types can also have multiple options with the same type, but different tags.

The difference between 'tags' and 'boxed types' is that a boxed type is a type, but it may not be visible at compile time.

So the advantage of Sum types is to make 'type' something that is always statically known at compile time, as opposed to boxed types, that are existentials (and they have to be existential because the static type system cannot know what type is in the box, it is only knowable at runtime).

I prefer to keep things simple and consistent, and say a type is something that is statically known at compile time, and a tag is something only known at runtime.

So if a tag is not a type, we need some way so specify what the runtime tags are, and that is what the constructor names in the data declarations are.

What an anonymous union is, can be represented as a sum type, where we infer the constructor names from the types of the constructor arguments.

Edit: to be a bit more explicit, let us note for readers that Left and Right wrap the actual types we want in the union. If unwrapped, then we do not need all that boilerplate and can collapse nested Either into a flat optimal tagged union of the unwrapped types. That was my original thinking on this matter which causes me to favor structural and not nominal.

Without 'Left' and 'Right' there is no way to tell what type is actually in the union at runtime. Your suggestion ammounts to auto-generating the 'Left' and 'Right' tags based on the types of the objects stored, so the runtime perfromance will be exactly the same.

What you lose by using the type names as the tags is that it makes it harder to refactor (changing the type stored would change the tag), and you cannot have more than one option that is of the same type.

shelby3 commented 7 years ago

Sum types can also have multiple options with the same type, but different tags.

Ditto an anonymous union is also a static type with a static number of multiple options. The only difference is whether the type is nominal. Which is the point I have been making.

The difference between 'tags' and 'boxed types' is that a boxed type is a type

Tags are the runtime reification of types. Tags are required for boxed types which need reification, e.g. type-case dispatch of code branches. Virtual methods are an alternative mechanism for dynamic dispatch.

I know you know all this. I am just presenting a different way to think about tags as not a special case of subordinate to type …

So the advantage of Sum types is to make 'type' something that is always statically known at compile time, as opposed to boxed types, that are existentials (and they have to be existential because the static type system cannot know what type is in the box, it is only knowable at runtime).

What you apparently mean is that the choice of multiple options is statically known for a Sum type (e.g. Left or Right for Either, Cons or Nil for List), but a typeclass object (or superclass in subclassing languages) the range of options for types (or subtypes) is unbounded.

Note that the multiple options for anonymous structural unions are also known (bounded) at compile-time.

I prefer to keep things simple and consistent, and say a type is something that is statically known at compile time, and a tag is something only known at runtime.

Your intent must be that the bound of types (which are tagged) is known at compile-time (aka statically known) but you know that the actual choice is not known at compile-time. A tag is known at compile-time, as it corresponds to a type known at compile-time. It is the bound of types in the union which is either known at compile-time or not.

So if a tag is not a type, we need some way so specify what the runtime tags are, and that is what the constructor names in the data declarations are.

The constructor names are types. In your nominal model, you want those constructors (e.g. Left and Right) to be subordinate to the nominal type (e.g. Either) and thus you think of them as being only tags. But rather I think the more general way to think about it, is that all types can be tagged (reified) and that constructors are also types. Which is precisely my point that with my proposal then all constructors are first-class types and can participate any where types do, including in any unions or intersections (even being reused in more than one of them which can't be generally done in your model).

Perhaps you could make an argument that wrapping enables types to normally not be tagged. But I think we want all copy-by-reference objects to be tagged by default. Really no reason not to. For integers and primitive types which are normally copy-by-value, we may want the option for those to be untagged normally.

Without Left and Right there is no way to tell what type is actually in the union at runtime.

The Left and Right are unnecessary boilerplate wrappers that obstruct the flattening of a union removing duplicates, when all types can be tagged (either they are always tagged, or they can be lifted to a tagged type when needed, e.g. lifting true to a Boolean in JS).

so the runtime perfromance will be exactly the same

There is an extra unnecessary unwrapping and tagging with Left and Right, if we still need to type-case on the wrapped type at runtime.

What you lose by using the type names as the tags is that it makes it harder to refactor (changing the type stored would change the tag)

I do not like implicit changes hidden from use sites of version control remember. Readability (not obscurity) is one of the highest priorities. Why is someone changing the name of a type that is already widely used in the source code. That is not stable code.

Type-case logic on those types whose names are changed would have to changed any way. So the refactoring argument may not be so valid.

, and you cannot have more than one option that is of the same type.

That is the point. Unions and intersections should never have more than one of the same type.

If you want to wrap a type, you create a constructor for that. In my proposal, no need to force wrapping until those rare cases you need it.

shelby3 commented 7 years ago

@shelby3 wrote:

But I think we want all copy-by-reference objects to be tagged by default. Really no reason not to.

On further thought, we do not really have any need for them to be tagged by default unless there will be an Any type.

Making them untagged by default enables optimizations such as storing the tag (as an index) in the unused upper bits of the 64-bit pointer to the boxed type (as Swift does on mobile) when the possible options are statically known.

But unless there is an Any, then tagged by default or lifted when assigned to a union type, is an implementation detail that does not need to be exposed to the programmer.

keean commented 7 years ago

Ditto an anonymous union is also a static type with a static number of multiple options. The only difference is whether the type is nominal. Which is the point I have been making.

An anonymous union cannot have multiple distinguished values with different types for example:

data Result[A] = Error(String) | Value(A)

Cannot be expressed as an anonymous union, in the case that the result is a String.

Tags are the runtime reification of types. Tags are required for boxed types which need reification, e.g. type-case dispatch of code branches. Virtual methods are an alternative mechanism for dynamic dispatch.

Not in the above, the tags are values. Both Error("MyError") and Value(32) would both have the type Result[Int] if returned from the same function. Hence a function which could return them has the type f : () -> Result[Int]. Error() and Value() are not types, they are data constructors (hence the round braces not square). There is no subtyping here, and that is important.

To make things work the way you want, both Value and Error would need to be types, and you would not have a type called Result[] instead you would declare two types like this:

data Error = Error(String)
data Value[A] = Value(A)
type Result[A] = Error | Value[A]

Note Result is now just a type alias not a type, and we have the slightly weird construction where the parameter A is only used on one side of the anonymous union. The function would now have the type f : () -> Error | Value[Int].

The constructor names are types. In your nominal model, you want those constructors (e.g. Left and Right) to be subordinate to the nominal type (e.g. Either) and thus you think of them as being only tags. But rather I think the more general way to think about it, is that all types can be tagged (reified) and that constructors are also types. Which is precisely my point that with my proposal then all constructors are first-class types and can participate any where types do, including in any unions or intersections (even being reused in more than one of them which can't be generally done in your model).

As explained above, this is wrong for Sum types, the constructor names are not types, they are values. They have a type. For example from above:

Error : String -> Result[A]
Value : A -> Result[A]

They are no different to a normal function like 'f', and according to our agreed nomenclature they probably should not have a capital first letter.

The Left and Right are unnecessary boilerplate wrappers that obstruct the flattening of a union removing duplicates, when all types can be tagged (either they are always tagged, or they can be lifted to a tagged type when needed, e.g. lifting true to a Boolean in JS).

No because they hold semantic value, for example you do not want to flatten a binary tree:

data BinaryTree[A] = branch(BinaryTree[A], BinaryTree[A]) | leaf(A)

Flattening this would result in an infinite recursion in the type checker. You could resolve that by introducing recursive types ("mu" types), but generally recursive types are considered bad.

There is an extra unnecessary unwrapping and tagging with Left and Right, if we still need to type-case on the wrapped type at runtime.

No because the types need to be boxed (which is wrapping and tagging) if you don't do this.

I do not like implicit changes hidden from use sites of version control remember. Readability (not obscurity) is one of the highest priorities. Why is someone changing the name of a type that is already widely used in the source code. That is not stable code.

If you want to wrap a type, you create a constructor for that. In my proposal, no need to force wrapping until those rare cases you need it.

The problem is the type names have no semantic meaning, for example:

data Person = name(String) | age(Int)

Using name and age in the case statements of the program makes it much more readable and understandable than using String and Int. I think this is an important difference. Just using type names is easy for writing (less keystrokes) but make reading much harder. I want readable code.

Regarding the Any type, which is also known as Omega in type system, I am not a fan. I would only include an Any type as a last resort if needed for soundness, but I normally take it as a sign the type system has a bad design if it needs an omega type.

shelby3 commented 7 years ago

@keean wrote:

An anonymous union cannot have multiple distinguished values with different types for example:

data Result[A] = Error(String) | Value(A)

Cannot be expressed as an anonymous union, in the case that the result is a String.

I will disagree with your assumption by showing how to express it in pseudo-code:

data Error(str: String)
data Value[A](value: A)
type Result[A] = Error | Value[A]

Let me repeat, that you apparently have a perspective that prefers forcing nominal (i.e. by name) wrapping always and making the name a first-class type in the AST and making the tags a special case and subordinate to type. I prefer a unified system where all values have a type and all types can be tagged. For example in the following String is not wrapped but it is still tagged as a String:

type Result[A] = String | Value[A]

Afaics, it appears as though mine has more degrees-of-freedom than your preference, and all I lose is the unnecessary nominal type names in the AST (which I replace with aliases), which I posit is actually not a good feature to have for the reasons I already stated.

Note @keean and I discussed in private that tagging always takes place within a union if we require being able to destruct the options with type-case logic and I agreed. But my point is that we do not have double tagging due to forced wrapping (in @keean’s preference) in the case of JavaScript with my idea (JavaScript requires all types are tagged due to the presence of Any thus lack of 100% static, compile-time typing). Wrapping is an orthogonal concept to tagging. Wrapping means what we write in the source code, and afaics so far, my proposal appears to have more degrees-of-freedom because wrapping is not forced, thus our compiler can choose different strategies for different output targets (e.g. JavaScript versus for example C++).

In other words, if we write type Result[A] = String | Value[A] instead of type Result[A] = Error | Value[A], then our compiler can avoid forced wrapping and thus double tagging in JavaScript, while in some other language target our compiler might for efficiency reasons choose to not tag String except when it is placed in a tagged union.

As explained above, this is wrong for Sum types, the constructor names are not types, they are values.

Please do not accuse me of being wrong. Please state your technological opinion without any presumptions about your omniscience and my lack thereof. The same admonition applies to myself. We must respect each other as both worthy of valid insights, else it becomes very difficult to work together. I respect your knowledge. Once again, we both do not want this Meta stuff in public. We both need to reform ourselves. We both seem to have a problem with phrasing our discussions in a way that allows for our own myopia. Neither of us are perfect and omniscient. That is why we are discussing, to benefit from each others’ viewpoints.

Your perspective of what a Sum type presumably comes from your understanding from academia and programming language design research and discussion (e.g. Lambda-the-Ultimate). I am presenting my perspective on what a disjoint tagged union is. That does not guarantee I am wrong. We should present our technological ideas without accusations.

I have not redefined what I Sum type is. It is always a tagged union (100s of Google results say so). I think your authority on what a Sum type is, assumes an untagged union is distinct from a tagged union and thus the separation. But I am saying that I do not see any utility for an untagged union (for example, even if we add my solution for the Expression Problem where the compiler implicitly supplies type-case dispatch from a union to the typeclass bounds at the call site, still we need tagging because even a static union is bounded dynamic polymorphism). And I do not see the advantage of forcing wrapping as it presumes that all types will be not tagged by default, which is not the case for JavaScript. It is forces unnecessary wrapping and inability to remove duplicate underlying types from hierarchy of wrappings. It is a lot of boilerplate conflation of concepts. I do not know why your authority designed it that way.

There is no reason I can see that the options of a Sum type can not be first-class types. Rather there is a design choice between a nominal type wrapping system or a structural flattened design. I am contemplating that I prefer the later, because I enumerated the significant advantages I posit.

They have a type. For example from above:

Error : String -> Result[A]
Value : A -> Result[A]

You choose to make Result[A] a first-class nominal type. I choose not to. It is a design choice. I am opening my mind to all axis of the design choices that are available, not just to the choices made in the past by some others.

My choice also has analogous function types, but afaics I do not early bind the wrapping types Error and Value which afaics is analogous to the anti-pattern of subclassing which early binds interfaces to data:

anonymous function : String -> (Error(String) | Value[A])
anonymous function : A -> (Error(String) | Value[A])

P.S. I like alias as a more apt keyword than type.

keean commented 7 years ago

You are of course free to choose to use structural and union types, but you cannot redefine words. Your definition of a 'sum' type is wrong.

How about I choose to call a type a value from my perspective, and a union an intersection, etc. How can we have any hope of communicating.

shelby3 commented 7 years ago

@keean wrote:

Flattening this would result in an infinite recursion in the type checker.

I disgree:

data Leaf[A](A)
data Branch[A](BinaryTree[A])
type BinaryTree[A] = Branch[A] | Leaf[A]

The type checker does not need to recurse Branch[A] | Leaf[A], because Branch is a concrete type.

You could resolve that by introducing recursive types ("mu" types), but generally recursive types are considered bad.

Well if the above caused some problem, you could wrap the BinaryTree in a type:

data Leaf[A](A)
data Branch[A](BinaryTree[A])
data BinaryTree[A](Branch[A] | Leaf(A))

So we can get nominal names when we want them by wrapping.

shelby3 commented 7 years ago

but you cannot redefine words

Do you realize that some people at Bitcointalk cite you as an expert and use comments like that from you to claim I am not worthy.

Your definition of a 'sum' type is wrong.

After all my pleadings for us to civil, you go off the deep end. This is why we end up with an acrimonious relationship. Sad. I was really trying.

A sum type is a tagged union. Please do not overspecialize the definition to one design axis of a how a tagged union can be implemented.

A google search will return 100s of definitions of Sum type which say it is a disjoint tagged union.

Afaics, I have said nothing which makes Sum type anything other than a disjoint tagged union.

I have a more separation-of-concerns conceptualization of not conflating wrapping with tagging with boxing. They are all disjoint concepts.

If we can not be civil, I will move on. I appreciate all of your help. You have a lot of expertise. Thanks for sharing.

I know you catch a lot of mistakes I make, so it is with grief that I would move on. Believe me, I really appreciate your expertise. I have made a decision to not participate in any venues where I am not able to have amicable discourse because the Meta flame wars are too wasteful.

I am not even claiming I am correct on this issue. I just wanted to discuss it. I could careless who is correct, as long as I get the design correct. But your presumptions of my incorrectness is ad hominem and also it is noisy and interfering with us from reaching a technical understanding. We must always try to understand the technical points of each other.

How about I choose to call a type a value from my perspective, and a union an intersection, etc. How can we have any hope of communicating.

I do not have the gridlock (nor expertise) in my brain of being bound to adopted presumptions from white papers and university. If a sum type is defined to be a disjoint tagged union, then it is a disjoint tagged union. If I have a proposal which is disjoint and tagged, then I am congruent with the definition.

If your authority presumes that only values are tagged and never types, then a Sum type can not apply to disjoint tagged types, then whose lack of generality is that.

I began the discussion by stating that I was looking to create more generality of compositionality.

It was not done to trick you, or to fool you. It was so you could tell me what the mainstream conceptualizations are and I could explain how my idea was different.

Not to accuse me of being wrong because the mainstream decided that disjoint tagged union means tags are never for types and forced wrapping should be conflated with it.

It is all about having a very open mind that can look at the design space with a very fresh perspective that is not burdened by groupthink. It does not mean my idea is superior. That is why I am discussing it to try to brainstorm on the ramifications.

shelby3 commented 7 years ago

Btw if we want to cite authority, sums (co-products) and products afaik originate from the following 1989 reference which I cited for my Copute language research in 2010:

@shelby3 wrote:

Declarative Continuations and Categorical Duality, Filinski, sections 2.2.1 Products and coproducts, 2.2.2 Terminal and initial objects, 2.5.2 CBV with lazy products, and 2.5.3 CBN with eager coproducts

Apparently sum type is a more generalized co-product concept than what you are claiming is canonical.

Programming language design is still a wide-open field and I do not think we can even say that values are not types, as it depends on the chosen static model (perspective).

shelby3 commented 7 years ago

@keean wrote:

There is an extra unnecessary unwrapping and tagging with Left and Right, if we still need to type-case on the wrapped type at runtime.

No because the types need to be boxed (which is wrapping and tagging) if you don't do this.

I had already explained to not conflate boxing with wrapping nor tagging. Afaics, they are all disjoint concepts.

The tagging of a type does not force it to be boxed, rather it is the only the unbounded multiple options (of for example typeclass objects) that will force boxing (although a compiler might choose to employ boxing for static number of options of a disjoint tagged union but it is not forced to, because the memory footprint of the union is known at compile-time). I am repeating what I already wrote.

That is where boxing is defined to be a reference to the value instead of placing the value directly on (in the RAM for) the stack or data structure. If you are employing a different notion of boxing, please let me know. I presume we agree that tagging is distinct from boxing. Tagging is marking (reifying) the data so its type/name is known at runtime.

And the lack of forced wrapping does not force tagging every where the type is used outside of the context of a union or intersection. I had already explained (alluded to) all of this.

shelby3 commented 7 years ago

@keean wrote:

The problem is the type names have no semantic meaning, for example:

data Person = name(String) | age(Int)

Using name and age in the case statements of the program makes it much more readable and understandable than using String and Int.

  1. In my proposal you can still first-class wrap if you want (and the compiler can be smart enough not to automatically not tag twice if the JavaScript is not the output target). And in my mine you can even reuse those wrapping types in any union and intersection and any where a type can be used (i.e. they are first-class which simplifies IDEs because only one unified concept, etc).

  2. We are favoring typeclasses instead of type-case logic (the later is not extensible without refactoring), so your point is for the inferior coding case.

keean commented 7 years ago

I don't want to waste time disagreeing with you about what a sum type is. If we cannot agree consistent terminology we will never get anywhere.

Can we just agree to use the terminology as used by the rest of the world, and as it is used in text books and taught on computer science courses, so that we can get on with the discussion, and so that other readers can understand what we are talking about.

keean commented 7 years ago

The sum type is really the dual of the record type. In a record we have different properties identified by a label (which is a value) .

data Test = MyRecord {
    name : String
    age : Int
}

Here properties are accessed by label, not by type. Now you could argue that wrapping the properties with labels is unnecessary and you could use types instead. This would then become a type-indexed product type.

Likewise a sum type provides a way to have alternative contents accessed by label so given:

data Test = name(String) | age(Int)

Name and age are labels, serving the same purpose as the labels in the record.

shelby3 commented 7 years ago

@keean wrote:

I don't want to waste time disagreeing with you about what a sum type is.

Then why did you?

You obviously do not agree with me that a sum type is a disjoint tagged union, even though every definition I can find says it is a disjoint tagged union.

I suppose 100s of sources online are all wrong and @keean is correct.

keean commented 7 years ago

No I agree with you that a sum type is a disjoint tagged union. It is your definition of what a disjoint tagged union is that does not match the accepted textbook definition.

shelby3 commented 7 years ago

@keean wrote:

It is your definition of what a disjoint tagged union is that does not match the accepted textbook definition.

Canonical sources seem to disagree with you.

Wikipedia says:

In computer science, a tagged union, also called a variant, variant record, discriminated union, disjoint union, or sum type, is a data structure used to hold a value that could take on several different, but fixed, types. … In type theory, a tagged union is called a sum type. Sum types are the dual of product types.

Note above the word “types” is not values.

Dual means co-product as I wrote in a prior post citing Filinski (1989).

keean commented 7 years ago

This is correct if does "hold a value that could take on several different, but fixed, types". For example:

data Test = name(String) | age(Int)

Note that the type Test can hold a value of type String or Int. Which is exactly what the quotation you posted says. However the tags name and age are values as in:

case x of
    name(x) -> print x
    age(x) -> print x + 1

The case operates on the tag values.

shelby3 commented 7 years ago

@keean wrote:

Note that the type Test can hold a value of type String or Int.

Yes. So the wrappers are entirely unnecessary to meet the requirements of definition.

You can try to explain how my proposal did not match the definition and are not co-products (dual of product types) per the definition and per the 1989 original Filinski source.

However the tags name and age are values as in:

case x of
    name(x) -> print x
    age(x) -> print x + 1

The case operates on the tag values.

I have already addressed that. And it has nothing to do with the definition of tagged unions, nor more saliently, the concept of a co-product which is the underlying fundamental concept.

Just because you are accustomed to wrapped sum types, does not mean it has anything to do with what a co-product is. Rather the wrapping is an implementation detail and choice. I explained that with typeclasses, we disfavor type-case deconstruction.

keean commented 7 years ago

Because they are 'tagged unions' they have a tag. A tag is a value like 'name' or 'age', it's in the name.

If you want to use a type to index them, you are talking about a type-indexed-coproduct (TIC).

Indexing things with types is not common, because it requires an extension from simple types in the lambda cube (to types that depend on other types). Most type systems do not provide the type level functions and maps necessary to have types that depend on types.

shelby3 commented 7 years ago

Using the type as the tag is not an untagged union.

You are just arguing about nonsense.

You sound like a my high school Trigonometry teacher who failed my perfect scores because I did not draw a box around the answers.

keean commented 7 years ago

Using a type as a tag is a type-indexed-coproduct. A tag specifically has to be a value to keep everything in the 'simple types' corner of the lambda cube.

I don't understand how you can have any discussions with other computer scientists if you don't keep to agreed definitions.

keean commented 7 years ago

One more try:

In type theory, a tagged union is called a sum type. Sum types are the dual of product types. Notations vary, but usually the sum type A+B comes with two introduction forms inj1 : A -> A+B and inj2 : B -> A+B. The elimination form is case analysis, known as pattern matching in ML-style programming languages: if e has type A+B and e1 and e2 have type T under the assumptions x:A and y:B respectively, then the term case e of x => e1 | y => e2 has type T . The sum type corresponds to intuitionistic logical disjunction under the Curry–Howard correspondence.

So note in the formal definition e, x and y are values, and there is no possibility they can be types.