What is in a type hint?

lukescott / es-type-hinting

Typing hinting syntax for ECMAScript

3 stars 2 forks source link

What is in a type hint? #4

Open Naddiseo opened 9 years ago

Naddiseo commented 9 years ago

Some questions that should be thought through to make the TypeHint syntax element: What can a type hint be made of? A single identifier? What about generics? What about aliases? Would be useful for specifying the full type of a callback.

Something else that should be mentioned in the spec, is that it's only for hinting, not for enforcing; similar to how python annotations are spec'ed.

lukescott commented 9 years ago

TypeHint would likely be similar to extends on class. A TypeHint, at the base, would be a constructor / class, such as String, Number, etc. Although "foo" instanceof String return false - but for type hinting we would want them to match (autoboxing).

If the engine enforces the types, I'm thinking along the lines of:

function add(a Number, b Number) Number {
    return a + b;
}
// desugars to
function add(a, b) {
    a instanceof Number || console.warn("arg 0 not a Number");
    b instanceof Number || console.warn("arg 1 not a Number");
    var _return = a + b;
    _return instanceof Number || console.warn("return not a Number");
    return _return;
}

Probably shouldn't halt the execution and log a warning instead. If the engine doesn't enforce it, then a linter could enforce it, but it may not catch everything.

We should probably track these issues (other than what is a TypeHint) separately.

Naddiseo commented 9 years ago

The simplest route would be for TypeHint to be a single identifier, then have the static semantics to limit it to one of the global primitive objects, but lets explore all the alternates before that.

Naddiseo commented 9 years ago

You've suggest in other places that the TypeHint be similar to, if not the same as, LeftHandSideExpression which is what is used for the class extends syntax. This would allow bizarre hints like:

function f1() {foo: Function}.foo {}

function f2() new Function {} // what would this even mean?

function f3() 1 {} // syntactically valid, but semantics would probably disallow

function f4() function () function () {} {} {} // ad infinitum

var fn = arg function () class {} {} => 1;

But, on the other hand, it does allow some useful expressions:


// Inline classes: can allow the linter/checker to know the interface what what it should be checking
function f1() class TypeInterface { /* things */ } {}

// Use of a special "function" to notate optional, union types, and casting
import {Optional, Union, CastTo, List} from "@types";
function f2() Optional(Number) {}
function f3() Union(Number, String) {}
var a Number = CastTo(Number, "2");
function f4(...args Number) {}
function f5(arrayType List[Number]) {}

// Again, specifying an interface of the return type.
function f5() { interface: attributes } {} // or is this a syntax error? extends seems to allow it

Allowing that function call with special "functions" would resolve most of the issues I've raised, assuming that the linter/checker knows how to interpret them, especially generics (#6) and nullable/optional (#13). Using LeftHandSideExpression can also potentially resolve rest/spread (#5), and casting (#15).

It may also be possible to resolve aliases (#7), and any/void (#14):

import { Callback, Any } from "@types";
// function Callback(RetType Function, ...args List[Any]) Function {}
var ToNumberCallbackType = Callback(Number, Any);

function getToNumber(type String) ToNumberCallType {
   return {
     "string": Number,
    "A": function (arg A) Number { return arg.toNumber(); }
   }[type];
}

This library based approach is also used by mypy for python's type hinting, and is geared more towards static / compile time checking/linting than for runtime checking. (Also tracked in #9, and mentioned in #15).

Regarding the library itself: if it were solely a compile time thing, the library import and annotations can be stripped out of the resulting file after transpilation.

lukescott commented 9 years ago

How does this work exactly?

function f2() Optional(Number) {}
function f3() Union(Number, String) {}

Given that checking the type would likely be done with an instanceof operator. What does Optional return? What does Union return? How does that get fed into instanceof?

Allowing any syntax does allow for some invalid code. But then again, so does extends.

The question is: Should "anything" be allowed, or should it be restricted to simple variable references?

There is one use case where "any expression" could be handy:

import types from "./types";
function doSomething() types.foo {

}
// or
function doSomething() types["foo"] {

}
// or
function doSomething() types.get("foo") {

}

Essentially a case where the type isn't on the scope. But then again, could it be restricted to foo and types.foo (allow dot syntax).

And with any syntax, where does it end? You could do:

function doSomething() class Foo {

} {
 // ... function code
}

(which is completely useless)

Given that import creates references in the module scope, perhaps it is reasonable that types.foo isn't allowed. After all you could do something like:

import * from "./types" // imports everything, including "foo"
function doSomething() foo {

}

Lots of pros and cons. Perhaps starting out very restrictive is a good way to start out and then go from there.

Naddiseo commented 9 years ago

How does this work exactly?

function f2() Optional(Number) {}
function f3() Union(Number, String) {}

They would work at compile time, and likely not exist in the resulting file. I would imagine a babel transformer that walks the tree:

var transformer = {
    walkTypeHint(ast) {
        // Transform this proposal, into the estree types.
        if (t.isFunctionCall(ast) && ast.name.name == "Optional") { // union[undefined, Type]
            return t.UnionTypeAnnotation([t.TypeAnnotation(undefined), t.TypeAnnotation(ast.params[0]]);
        }
        else if (t.isFunctionCall(ast) && ast.name.name == "Union") {
            return t.UnionTypeAnnoation.apply(null, ast.params);
        }
   }
};

Given that checking the type would likely be done with an instanceof operator. What does Optional return? What does Union return? How does that get fed into instanceof?

Union doesn't have to return anything if it's removed at compile time.

I don't think you can use instanceof for anything but simplistic and naive type checking. And using instanceof runs into issue when trying to typecheck a map/dict/object:

function takesADict(arg Object) { assert(arg instanceof Object); }

takesADict({a: 1}); // passes, and it should
takesADict(new Map()); // passes, should it?
takesADict(new Function()); // passes, should it?
takesADict(function() {}); // passes, but it should not.

// there's also the weird checks with String vs new String vs "".

Alternately, Union could be implemented something like:

function Union(arg1, arg2) {
    return function checker(input) {
       assert (input instanceof arg1) || (input instanceof arg2);
    };
}
// --- other files ---
function Foo(arg1 Union(A, B)) { } // Gets transpiled:

function Foo(arg1) {
    Union(A,B)(arg1); // does the check here
}

However, this approach will incur runtime overhead.

Allowing any syntax does allow for some invalid code. But then again, so does extends.

Semantically invalid, yes. For extends, the semantic check is for "function or null". I think we can probably narrow down what is semantically valid, but we're still at the syntax phase, so we can leave the semantic checking until later.

The question is: Should "anything" be allowed, or should it be restricted to simple variable references?

Yes, another way to phrase the question is: solve this at a syntactic level, or at a semantic level.

If the it's solved at a syntactic level, the resulting hinting system will probably less expressive, and future friendly. Solving it at a semantic level means, that if there are changes down the line, we don't have to change syntax to solve them.

There is one use case where "any expression" could be handy:
import types from "./types";
function doSomething() types.foo {
}
// or
function doSomething() types["foo"] {
}
// or
function doSomething() types.get("foo") {
}
Essentially a case where the type isn't on the scope. But then again, could it be restricted to foo and types.foo (allow dot syntax).

This is another situation where I think it's better to solve on the semantics side than the syntactic side.

And with any syntax, where does it end? You could do:
function doSomething() class Foo {
} {
 // ... function code
}
(which is completely useless)

I think that depends on how you interpret the hint. It could be defined as:

let interfaceFoo = class Foo { method() {} };

// The arg/return needs to implement the `method` method
function doSomething(arg interfaceFoo) interfaceFoo { return arg; }

doSomething({}); // fails
doSomething(function() {}); // fails
doSomething({ method() {} }); // passes, has a `method` method
doSomething(class { method() {} }); // passes.

Given that import creates references in the module scope, perhaps it is reasonable that types.foo isn't allowed. After all you could do something like:
import * from "./types" // imports everything, including "foo"
function doSomething() foo {
}

Okay, that case is where it gets a little tricky. You're forced to do runtime checking if you don't want to deal with the import at compile time. Still, that's semantics rather than syntax.

Lots of pros and cons. Perhaps starting out very restrictive is a good way to start out and then go from there.

Okay, if we assume a single identifier, what are the pros/cons have that?

lukescott commented 9 years ago

How does this work exactly?
function f2() Optional(Number) {}
function f3() Union(Number, String) {}
They would work at compile time, and likely not exist in the resulting file. I would imagine a babel transformer that walks the tree:
var transformer = {
    walkTypeHint(ast) {
        // Transform this proposal, into the estree types.
        if (t.isFunctionCall(ast) && ast.name.name == "Optional") { // union[undefined, Type]
            return t.UnionTypeAnnotation([t.TypeAnnotation(undefined), t.TypeAnnotation(ast.params[0]]);
        }
        else if (t.isFunctionCall(ast) && ast.name.name == "Union") {
            return t.UnionTypeAnnoation.apply(null, ast.params);
        }
   }
};
Given that checking the type would likely be done with an instanceof operator. What does Optional return? What does Union return? How does that get fed into instanceof?

Union doesn't have to return anything if it's removed at compile time.

That all seems very messy.

I don't think you can use instanceof for anything but simplistic and naive type checking. And using instanceof runs into issue when trying to typecheck a map/dict/> object:
function takesADict(arg Object) { assert(arg instanceof Object); }

takesADict({a: 1}); // passes, and it should
takesADict(new Map()); // passes, should it?
takesADict(new Function()); // passes, should it?
takesADict(function() {}); // passes, but it should not.

// there's also the weird checks with String vs new String vs "".

Why not? Object is a generic. It's a quirk of the language sure, but type systems typically respect inheritance.

The biggest issue is String vs "". Technically "" is not a String, but it is when autoboxed. As in when you do "foo".substr(1) what is actually happening behind the scenes is (new String("foo")).substr(1).toString().

The runtime checking would likely look something like:

function isType(value, type) {
  switch (type) {
    case String:
      return typeof value === "string";
    case Number:
      return typeof value === "number";
    case Boolean:
      return typeof value === "boolean";
  }
  return value instanceof type;
}

Although this could probably be optimized with jit. Meaning that if you know the type is String (not a shadowed String), you can use typeof instead of instanceof. V8 could probably to an optimization like that.

Alternately, Union could be implemented something like:

function Union(arg1, arg2) {
    return function checker(input) {
       assert (input instanceof arg1) || (input instanceof arg2);
    };
}
// --- other files ---
function Foo(arg1 Union(A, B)) { } // Gets transpiled:

function Foo(arg1) {
    Union(A,B)(arg1); // does the check here
}

However, this approach will incur runtime overhead.

And how do you know the returned value isn't the type? String is a function too. Is it supposed to run that as well?

Lots of pros and cons. Perhaps starting out very restrictive is a good way to start out and then go from there.

Okay, if we assume a single identifier, what are the pros/cons have that?

Single Identifier

Pro: Easy to lint. Don't have to run any code.

Pro: Easy to optimize at runtime. If you know you're checking for String while parsing, you can opt for a typeof vs an instanceof rather than a function that checks every time.

Pro: Very clear and easy to understand.

Con: No dynamic types?

Any expression

Con: Not easy to lint. Have to run code at runtime.

Con: Not easy to optimize at runtime. Has to be run.

Con: Has the possibility of being confusing if abused.

Pro: Dynamic types?

It seems to me a single identifier is the way to go. The only thing between the two of them I can think of is dynamic types. Which brings up another question: Is this legal?

var DynamicType = RandomType();
function doSomething(crazy RandomType) {

}

Even with a single identifier you can do something "dynamic". It's just a round-about way of doing it. Is there any way to only allow class and function declarations? Would it be worthwhile to do so?

Naddiseo commented 9 years ago

A use case we've yet to address is forward referencing:

class A {
   getB() B { // B doesn't exist here due to TDZ
       return new B();
   }
}

class B {
  getA() A { return new A(); }
}

Taking hints from #17, and going with a restrictive first draft, TypeHint should just be:

TypeHint:
    StringLiteral

Restricting the type hint to just a string literal has may pros:

Doesn't dictate the syntax of the hinting (I think this is a big thing).
Could be used as inline JSDoc, which TC39 discussion in #17 seems to want to standardize.
Expressive.
Simple syntax change.
Allows forward referencing.
Does not require that the parser or engine do anything with the hint.
Does not have the comment attachment problem that JSdoc comments have when run through babel.

Cons:

Library/runtime checking would need to parse the string. (Or use eval?)
How would you reference types from other files?

Minor Cons:

No IDE syntax highlighting, or completion. I think this could come later.
Need to type quotations
Possible footgun in object literals if a comma or colon is missed.

I also think that using a single string literal for the hint has the biggest chance of getting the spec past Stage 0 since it's least controversial.

lukescott commented 9 years ago

So, just so I understand what you're saying, you mean:

class A {
   getB() "B" {
       return new B();
   }
}
class B {
  getA() "A" { return new A(); }
}

Is that correct? I'm not sure many people would agree with that. It looks a bit unusual.

I'm not sure forward referencing is an issue in this case though. At least if you think of the following as being de-sugared to:

class A {
   getB() {
      let _return = new B();
      _return instanceof B || console.warn("return value is not B");
      return _return;
   }
}
class B {
  getA() A {
      let _return = new A();
      _return instanceof A || console.warn("return value is not A");
      return _return;
   }
}

True that A and B are undefined at some point, but not until the methods are used.

I've also been tinkering with this idea:

Object.isA = function(value) {
    return value instanceof this;
};
String.isA = function(value) {
    return typeof value === "string";
};

Although without being on the prototype that doesn't really work on new classes. General idea is the type as an isA method to do the comparison.

lukescott commented 9 years ago

Another way is doing it the other way around:

Object.prototype.isA = function(constructor) {
    return this instanceof constructor;
};
String.prototype.isA = function(constructor) {
    return constructor === String;
};
Number.prototype.isA = function(constructor) {
    return constructor === Number;
};
Boolean.prototype.isA = function(constructor) {
    return constructor === Boolean;
};

Then you can do something like this as the de-sugar:

class A {
   getB() {
      let _return = new B();
      _return.isA(B) || console.warn("return value is not B");
      return _return;
   }
}
class B {
  getA() A {
      let _return = new A();
      _return.isA(A) || console.warn("return value is not A");
      return _return;
   }
}

Naddiseo commented 9 years ago

Consider what is currently in use (taken from JSDoc website):

/**
 * Returns the sum of a and b
 * @param {Number} a
 * @param {Number} b
 * @param {Boolean} retArr If set to true, the function will return an array
 * @returns {Number|Array} Sum of a and b or an array that contains a, b and the sum of a and b.
 */
function sum(a, b, retArr) {
    if (retArr) {
        return [a, b, a + b];
    }
    return a + b;
}

// vs
/** Returns the sum of a and b */
function sum(a "Number", b "Number", retArr "Boolean") "Number|Array" {
    if (retArr) {
         return [a, b, a + b];
    }
    return a + b;
}

Using an inline TypeHint is subjectively better, and objectively easier to type, and more "attached" to the thing it hints. What I'm suggesting is that the jsdoc have a place inline. However, saying that what is inside the must be a JSDoc is too opinionated.

I've been thinking about something like this:

Object.prototype.isA = function(constructor) {
    return this instanceof constructor;
};
String.prototype.isA = function(constructor) {
    return constructor === String;
};
Number.prototype.isA = function(constructor) {
    return constructor === Number;
};
Boolean.prototype.isA = function(constructor) {
    return constructor === Boolean;
};

The main issue I see with something like that, is that it forces the actual checking to be done with instanceof or typeof. Given that the current three major implementations of type checking in ES (closure compiler, typescript, flow) all use something more sophisticated, I think it would be a mistake to push that kind of limitation in the spec. Also, that's type checking, not type hinting.

My thoughts:

Many developers agree there is need for type hinting. (Evidence: we have JSDoc)
Many developers agree there is need for type checking. (Evidence: we have closure compiler, typescript, flow, Actionscript)
Many developers agree there is need for strong type checking. (Evidence: typescript, flow, closure all support strong type checking with generics/templates, type aliases)
From #17: "Not proposing a type system. Implementors are interested in standardizing JSDoc..."
From #17: "TypeScript is the strongest proposal at this point. Guards have serious problems for structural types. A sound type system (for JS) is still an academic research topic without a solution in sight."
Adding prototype methods, or using instanceof/typeof checks belong in libraries, such as typecastjs, because they are weak/shallow type checking.

My take away:

Type /hinting/ is something desired by the ECMA committee, and developers.
Type /checking/ is something that's still a research subject, and would take years to make something useful and accepted, thus should be left to libraries for now.

lukescott commented 9 years ago

@Naddiseo you've given me a lot to think about. You're right, we should focus on type hinting and leave type checking to the implementer, whether that be a library or otherwise.

As far as forward references go, I think there are a number of ways to handle that, but that really falls under the responsibility of the type checker.

I've been thinking about flow, and I think we should follow a subset of what they are already doing. These are my thoughts:

Use number, string, and boolean, and void as base types.
Allow classes as types. Functions can technically be classes, but I'm not sure about that one.
Be stricter about the typing than flow is. No mixed type or union types. A variable should only ever be one type if it has a type annotation. In a string | number situation, if you could take "5" or 5 a conversion should be done at the call-site.
Perhaps No on maybe/optional types because undefined/null has odd interactions, and function args can now set defaults:
- "foo" + undefined "fooundefined"
- 5 + undefined = NaN
- "foo" + null = "foonull"
- 5 + null = 5
No Array. An Array can contain any type, and that's hard to enforce. If typed arrays are added at some point, they are likely to be something like Uint32Array. So it's better to leave that alone for now.
No declarations. We're not doing type checking, and I think we can assume ES6 modules.
No type aliases, or at least we don't need to define them. We probably only care about the annotations themselves, not where they come from.
Don't think we'll need typeof.

I think we're pretty close to that already. Just have to define TypeHint - I'm thinking a single reference, like a variable name.

Naddiseo commented 9 years ago

Okay, let's start out with something restrictive like that as a first draft, and if the feedback we get strongly suggests developers want something more expressive, we can revisit the issue.

I propose that TypeHint be defined as follows:

TypeHint[Yield]:
    [~Yield] IdentifierReference[~Yield]
    number
    string
    boolean
    void

I'll need someone with a bit better understanding of the syntax to verify the yield parameter is doing what I think it means.

We just need to resolve #12 then we'll have the syntax completed, at which point I think we can start getting more feedback.