microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
99.85k stars 12.36k forks source link

TypeScript Bytecode Interpreter / Runtime Types #47658

Closed marcj closed 2 years ago

marcj commented 2 years ago

TypeScript currently solves a lot of issues with working with JavaScript today. From type checking itself, transpiling, to AST and language services for compilers, linters, and editors. The gradual and structural typings are unique in a way and it seems a perfect fit for JavaScript. However, two major aspects are currently unsolved: a runtime type aka reflection system and performance issues. Having types available in runtime enables use cases that currently are simply not possible and for which many hundreds expensive, complex, and unsexy workarounds have been built.

In this post I’d like to talk about the use cases and current state of runtime type solutions, a proposal for a full TypeScript runtime type system aka type reflection solution, and a bytecode interpreter with instruction set specification as a by-product that can potentially be used as alternative to the current type checker with performance improvements, and allows new adoption of TypeScript in completely new non-javascript environments.

Runtime Types

What does it mean to have runtime types? There are several levels of reflection systems that could be built. From very basic reflection like with emitDecoratoraMetadata to reasonable one that emits all types of functions, classes, to a more verbose one that emits interfaces, type aliases, and type functions, to very verbose ones that essentially annotate each variable + control flow analysis. What is necessary is defined by the use-case at hand. In a perfect world, we would have access to all types of all symbols (including variables), but that’s not necessarily possible or desired due to restrictions, like e.g. bundle size.

I categorise possible reflection systems in four levels:

  1. Basic reflection of primitives: string, number, boolean, bigint, Date, classes.
  2. Reflection of JavaScript symbols: class properties/methods, function parameters.
  3. Advanced reflection of all TypeScript types: interfaces, type alias, type functions (conditional types, mapped types, index signatures, etc), generics, etc.
  4. Perfect reflection of all variables and expressions, including control flow analysis.

Level 1 is already integrated in JavaScript itself via typeof.

Level 2 can be partially achieved by using either class decorators and emitDecoratorMetdata, or fully via schema declaration libraries. It means however a lot of additional boilerplate code and complexity.

Level 3 is currently not available. Although there are a few PoC transformers that try to extract the most basic type information, they are far from being able to reflect all TypeScript types and also impose substantial overhead.

Level 4 is somewhat unreasonable, very excessive, and has probably no use-case where level 3 is not already sufficient enough.

I consider level 3 the most reasonable and doable as it makes the whole TypeScript type system available in runtime in a way that supports all type expressions while not excessively emit too much reflection data for each and every variable or expression. This level is what this proposal is trying to achieve.

Demand

There are meanwhile a lot of libraries trying to solve this issue and making types available in runtime, in one way or another. Some provide basic type support, while other support additionally validation and database information, and a few support relatively complex type expressions (like union, intersections).

Here is a list of known libraries and their download count:

Name Type Use case Downloads/Month
Total 76,619,220
joi Schema declaration Validation 16,764,980
yup Schema declaration Validation 13,644,252
zod Schema declaration Serializer/Validation 1,656,548
io-ts Schema declaration Serializer/Validation 1,897,544
ts-json-validator Schema declaration Validation 35,992
runtypes Schema declaration Validation 348,432
superstruct Schema declaration Validation 1,698,868
class-transformer Class with decorators Serializer 4,020,084
class-validator Class with decorators Validation 3,752,412
typeorm Class with decorators ORM 2,813,516
mikro-orm Class with decorators ORM 20,876
prisma Custom DSL ORM 1,072,260
type-graphql Class with decorators RPC/GraphQL 451,104
nest Class with decorators Dependency Injection 4,610,144
typescript-ioc Class with decorators Dependency Injection 46,216
InversifyJS Class with decorators Dependency Injection 2,248,848
angular Class with decorators Dependency Injection 11,660,628
trpc Schema declaration RPC 62,140
ow Schema declaration Validation 2,649,368
typebox Schema declaration Reflection 120,196
monocle-ts Schema declaration Reflection 685,848
typescript-is Transformer Validation 35,636
typescript-json-schema Transformer Validation/Reflection 6,323,328

State of 28. January 2022

There a many hundreds more that all re-implement a way to make types available in runtime. The space is defragmented quite a lot and APIs not streamlined. If they all would be able to use TypeScript types in runtime, they probably would.

Although it can be considered a good thing that types are erasable per default, it’s clear that types have value. And removing them destroys the value they have and makes it necessary to make them available again in an expensive way. The value depends highly on the project at hand. For many projects and people that value is so big that they create new libraries making the types available in runtime in one way or another, and invest in total many million dollars doing so. The value proposition is self-explanatory once you have a use case at hand that deals with types. TypeScript should thus provide an official way of making them accessible in runtime.

Considering the total download amount of 76mio downloads/month for these libraries compared to Typescript's 123mio download/month it shows a clear picture of the huge demand runtime types have. Including ajv which uses essentially json-schema draws an even more demanding picture, because json-schema could be in the JavaScript world replaced with TypeScript using this proposal. Although not the whole user base of ajv can be included in the calculate since a lot just consume an external json-schema or one that is shared with other non-JavaScript tech stack.

Workarounds

There are several ways to get runtime types. Some are much more complex than others. They differentiate in how much TypeScript types are actually expressible and available in runtime.

Classes with decorators

class User {
    @Property()
    id: number;

    @Property()
    username: string;

    @Property({type: 'array', elementType: String})
    tags: string[];
}

For more complex types like an array of strings, the type needs to be annotated manually.

The idea is that a decorator is used at a class property and emitDecoratorMetadata is enabled so that basic type information are automatically available in runtime. The overhead is substantial as a lot of code is generated for relatively little functionality. The functionality is not only limited in the type expressiveness (anything more complex needs to be annotated manually), but also other key functionality like circular imports are simply not supported and need a workaround usually called “forward reference” where an arrow function is used to defer symbol resolution, e.g. forwardRef(() => MyType). There are other disadvantages like property and parameter names do not survive a minimisation process and functions/interfaces are not supported.

Custom DSL (domain specific language) with code generator

model User {
   id Number
   username String
   tags String[]
}

A code generator then generates TypeScript types. This has the obvious disadvantage of learning yet another type language and use for every change a code generator. This increases the project complexity and mental model enormously. Also, usually those DSL are much less powerful than TypeScript so that the expressiveness of types and the program suffer.

Schema declaration

Declaration of a schema happens usually via method chaining API.

const user = t.schema({
    id: t.number,
    username: t.string,
    tags: t.array(t.string),
});
type User = ExtractType<typeof user>; // {id: number, username: string, ...}

There are usually helper types included in such libraries to extract the declared type for the type system.

Transformers

Transformers parse the AST of the TypeScript source or Checker.ts types, and emit some way of runtime type information. That information could look like that:

class User {}
User.type = {
   id: {type: 'number'},
   username: {type: 'string'},
   tags: {type: 'array', element: {type: 'string'}}
}

This type of data structure can get big pretty fast and is not suitable for a big and complex code base. Its structure is very inefficient in terms of size and runtime overhead. Also, generics with type functions (conditional types, mapped types, etc) are not supported with such an arrangement. On the other side this is very user friendly as one can use simply write TypeScript without learning anything fundamentally new. It just works - which is a fundamental advantage.

Relevant Github Issues

There are many Github issues created in the TypeScript repository regarding runtime type data in one way or another. This also represents as a proof for further demand, beside the list above of projects that have actually invested tons of resources into making it happen.

  1. https://github.com/microsoft/TypeScript/issues/3628 Discussion: (Reflective) Type Model
  2. https://github.com/microsoft/TypeScript/issues/3060 TypeScript interfaces for Dependency Injection
  3. https://github.com/MichalLytek/type-graphql/issues/296 Enhanced types reflection system
  4. https://github.com/microsoft/TypeScript/issues/2902 Suggestion: Interfaces as string literals
  5. https://github.com/microsoft/TypeScript/issues/9182 Feature suggestion: Compile-time reflection of string literal unions and enums
  6. https://github.com/microsoft/TypeScript/issues/10576 Get type of generic type
  7. https://github.com/microsoft/TypeScript/issues/11860 Design-type metadata for generic types
  8. https://github.com/microsoft/TypeScript/issues/11861 Decorate all properties with type metadata in TypeScript
  9. https://github.com/microsoft/TypeScript/issues/12605 [Proposal] Implement a way to emit json type information
  10. https://github.com/microsoft/TypeScript/issues/12714 Make metadata available at runtime
  11. https://github.com/microsoft/TypeScript/issues/11362 Proposal for complete reflection mechanism
  12. https://github.com/microsoft/TypeScript/issues/12577 Provide more detailed metadata for decorators
  13. https://github.com/microsoft/TypeScript/issues/19154 Emit metadata with union constituents
  14. https://github.com/microsoft/TypeScript/issues/20887 TS Proposal : abstract class expression, generic metadata, constructor typing, abstract class decorators, abstract members decorators
  15. https://github.com/microsoft/TypeScript/issues/21479 compiler option to emit interface symbols?
  16. https://github.com/microsoft/TypeScript/issues/39966 Improve generation of reflection

All of these issues can be solved with the proposed solution. The list is not complete, but should indicate how broad the demand is.

Design Goals of TypeScript

TypeScript has several design goals that need to be considered before implementing a solution for type reflection. Strictly speaking there are three points that oppose a runtime type system, namely:

  1. Impose no runtime overhead on emitted programs.

  2. Emit clean, idiomatic, recognizable JavaScript code.

  3. Use a consistent, fully erasable, structural type system.

TypeScripts design goals explicitely state in point 9 that type information should be erasable. With this proposal this goal is not defeated. It is and should still be possible to erase type information completely if the user has no need for them.

While goal 3 and 4 are one of the reasons for TypeScript’s success and fundamental a good thing, it’s inevitable to break or bend them in order to make types available in runtime. Runtime types create an overhead, although done correctly a very small one, it’s still an overhead. It has a cost. The cost can be justifiable, but still is a cost. This proposal tries to produce as little runtime overhead as possible. TypeScript types are not JavaScript and probably never will, hence any structure or bytecode emitted in JavaScript that represent TypeScript types will not be clean, idiomatic, and recognizable JavaScript code. It will be valid JavaScript though. However, since the trend goes to minimise and further optimise the generated JavaScript code from the TypeScript compiler up to a point of not being readable anymore, those points vanish more and more. A valid point is still though that emitted JavaScript should not contain unnecessary code that never will be used, so making the runtime types configurable and three-shakeable is crucial.

Use Cases

A lot of use cases can be found where runtime types could be a game changer regarding new functionality that currently is just not possible, regarding user experience improvements and code quality in general.

Serialization/deserialization

Even basic types like a class with date properties class {created: Date} need a proper deserializer to get the correct Date type back when deserializing JSON. Serialization to JSON and deserialize back to JavaScript objects is a core functionality that almost every application needs. A use case is receiving a HTTP request with a JSON body in a backend application and deserialize it correctly.

serialize<Date>(new Date(1234567890)); //'1970-01-15T06:56:07.890Z'

deserialize<Date>('1970-01-15T06:56:07.890Z'); //new Date(1234567890)

Type casts

Type casts (x as string) in TypeScript are no real casts. They do not change anything in runtime. However, if the types are available in runtime, there could be a function that does actual type casting.

cast<number>('123'); //123
cast<Date>(1234567890); //Date(1970-01-15T06:56:07.890Z)

cast<User>({id: 0, username: 'Peter'}); //instanceof User

Validation

With types that add meta-data one can annotate models with validation information and use them in a validator to validate arbitrary data against very complex types. Type and actual data validation is then very easily possible.

interface User {
   id: number & Positive;
   username: string & MinLength<3>;
}

validate<User>({id: 3, username: 'Peter'}); //true
validate<User>({id: '3', username: 'Peter'}); //false
validate<User>({username: 'Peter'}); //false
validate<User>({id: -3, username: 'Peter'}); //false

validate<number & Positive>(3); //true
validate<number & Positive>(-3); //false

Validation can go as far as automatically validate each function argument and throw an error if invalid, like other language do it like PHP. It would be further possible to automatically resolve overloaded functions to their correct implementation. Currently, this needs to be done manually.

Automatic type guards

Type guards need to be written manually, but with a runtime type system they can be generated on-demand automatically.

is<string>('123'); //true

interface User {
    id: number;
}

const data = JSON.parse(request.body);

if (is<User>(data)) {
    data.id; //number
}

Dynamic type computation

By supporting dynamic type computation, arbitrary type functions can be used in runtime to generate types based on runtime data. E.g. is it possible to construct an object literal based on a schema stored in a database, which can then be used for all other use cases outlined here.

Even generic types can be instantiated with arbitrary runtime types as arguments.

class Bag<T> {
    items: T[];
    add(item: T){ this.items.push(item) }
}

const stringBag = reflect<Bag<unknown>>(typeOf<string>());

API documentation

By having type information available in runtime, an API documentation tool can be generated on-the-fly without a build process (like typedoc).

Dependency Injection

Runtime types allow for the first time to write code/services against abstractions/interfaces instead of against implementations and auto-write them. Most DI container libraries use workarounds like using string/symbol identifier to abstract that away, which is not ideal. Also, many use decorators and emitDecoratoraMetadata which requires additional boilerplate code.

An example of dependency injection container with interface support could look like that:

interface Logger {
    log(...messages: any[]);
    warning(...messages: any[]);
    error(...messages: any[]);
}

class MyLogger implements Logger {
    //...
}

class Controller {
    constructor(private logger: Logger) {} //interface as dependency 
}

const serviceContainer = new ServiceContainer([
     //since MyLogger satisfies the interface of Logger, it will be used
     //for dependencies that requested the Logger interface.
     MyLogger,
     Controller,
]);

Configuration system

By adding resolution information to computed types in the runtime type system its possible to resolve configuration values automatically in the easiest to use way possible.

class MyAppConfiguration {
   databaseUrl: string = 'mongodb://localhost',
   host: string = 'localhost',
}

class Service {
    constructor(
        private logger: Logger, 
        private host: MyAppConfiguration['host'],
    ) {}
}

When getting the runtime type information of [Service.host](http://Service.host) its type information include the information that its type string was created via an index operator of MyAppConfiguration and 'host' so that there is a link to the configuration class itself. A service container can then auto-wire the configuration values.

HTTP router parameter matching

When dealing with HTTP route parameters, a lot of times there are placeholders and query parameters involved. Those are used to match a route from a list of all registered routes and automatically validated and deserialized.

class MyController {
   @GET('/user/:id')
   getUser(id: number) {
       //id is always automatically a number, 
       //and route not found when :id is invalid
   }
}

Type safe RPC

There are many RPC implementations and possible transport encodings available to implement one: GraphQL, Protobuffers/gRPC, etc. With having all function types available in runtime, a library can use those information to automatically serialize parameter and deserialize/validate them in the backend.

//backend
class BackendController {
   registerUser(user: User): boolean {
   }
}

//frontend
import type {BackendController} from './backend';
const client = new Client<BackendController>('localhost');
const status = await client.registerUser(new User('Peter'));

ORM entities

ORM entities try to map relation tables or document collections to objects. For this a lot of meta-data is necessary to describe the table correctly: Primary key, auto-increment, index, unique, collation, constraints, foreign key, and more. With types that add meta-data it is possible to have those information at runtime available so that query builder, data mapper, and SQL migrations can work correctly.

class User {
    id: number & PrimaryKey & AutoIncrement = 0;
    email?: Email;
    image?: Image & Reference;
    constructor(
        public username: string & Unique & MinLength<3> & MaxLength<32>,
    ) {}
}

Type reflection

With a general type reflection API that allows to easily discover and navigate through types its possible for library authors to implement new use cases easily.

typeOf<string>(); //{kind: 'string'}

typeOf<Date>(); //{kind: 'class', classType: Date}

typeOf<User>(); //{kind: 'class', classType: User, types: [{kind: 'property', ...}]}

const user = ReflectionClass.from(User);
user.getMethods();
user.getProperties();
user.getProperty('id').type.kind; //ReflectionKind.number
user.getProperty('id').isOptional(); //boolean

Bytecode Interpreter

I thought a lot about how to actual emit and represent the type information in JavaScript without imposing too much initial runtime overhead (in terms of memory, parsing, and executing) and too much code size (bundle size is important). I asked myself “What could be the smallest possible encoding for types”? The obvious answer is a number, a simple byte. A string? A byte. A number? A byte. An array of numbers? Probably two bytes. A union of string|number? Multiple bytes arranged clever. An object with properties? Probably a lot of bytes. Having such an arrangement of raw bytes implies to have some kind of interpreter that reads them and constructs handy type objects, e.g. {kind: 'string'} instead of just having a simple number like 0, especially when the type is more complex.

Primitives

So, bytes encode a type. For primitive types like string, number, boolean, bigint, etc this is rather trivial. When giving each of them a unique number, then an interpreter can easily construct type objects. For example:

enum TypeOp {
    string = 0;
    number = 1;
    boolean = 2;
    bigint = 3;
}

In JavaScript itself, it can then be encoded just with an array of numbers:

const typeA = [0]; //string
const typeB = [1]; //number

And an interpreter can easily read them and construct a type. The interpreter aka processor is a stack based virtual machine. A very simple implementation could look like that:

type Type = {kind: 'string'} | {kind: 'number'} | {kind: 'boolean'} | ...;

function processor(bytes: TypeOp[]): Type {
    const stack: Type[] = [];

    for(const op of bytes) {
        switch(op){
            case TypeOp.string: 
                 stack.push({kind: 'string'});
                 break;
            case TypeOp.number: 
                 stack.push({kind: 'number'});
                 break;
        }
    }

    return stack[stack.length - 1]; //return last stack entry
}

processor(typeA); //{kind: 'string'}
processor(typeB); //{kind: 'number'}

This seems rather straightforward.

Who creates those number arrays? The compiler. A TypeScript transformers reads the AST and knows what AST node can be encoded to which TypeOp. This transformer is what I call the type compiler. It creates little byte programs that can be interpreted by a runtime stack machine. The compiler and processor are tightly coupled by a contract that precisely describes what byte ops are available and how they behave in the processor.

What about more complex types? Lets think about an array:

//introduce a new op for arrays
enum TypeOp {
    array = 4;
}
const typeC = [0, 4]; //string[]
//            ^ [string, array]

We introduce a new op array = 4, which has in its implementation defined to pop the last stack entry and use it as element type. Lets implement the op in our processor:

case TypeOp.array: {
    const elementType = stack.pop(); //pop last type from stack
    stack.push({type: 'array', elementType});
    break;
}

So, with the bytes [0, 4] (which is [string, array]) we get an object back of the structure:

{type: 'array', elementType: {kind: 'string'}}

because the first op 0 pushes {kind: 'string} onto the stack, and the second op 4 pops that type and pushes a new one using that type as its elementType.

As one can see, the op array needs at least one type on the stack or it will fail. This implies that the given bytes need to be valid. The compiler is responsible for creating a valid bytes array so that the processor never crashes.

Stack references and op parameters

When dealing with type literals, e.g. 'abc' a new concept in the bytes array arrangement is required. Lets first define a new TypeOp for literal:

enum TypeOp {
    literal = 5;
}

We need suddenly the string 'abc' somehow in our processor. A solution could be to just contain 'abc' in the array: ['abc', 5] But where do we know where to start processing ops then and how do we know to pick the first entry of the array? We need to introduce op parameters and an initial (input) stack frame for references.

We change the byte array structure to the following:

const typeD = ['abc', [5, 0]]; //'abc' literal
//                    ^ [literal, ref0]

Here two things changed: First the actual byte array of ops moved to the last entry of the array, and the op 5 has a 0 as parameter. The op 5 expects always an additional value behind it which it will read and then jump over. The read value (in this case 0) is then used as array index to access the actual literal value 'abc'.

We could call now this array of typeD a program. It has a memory with values and an actual program with ops. As convention we say always the last entry of this array is the actual ops byte array and everything in front of it is a value or reference to something. Can be a string, number, boolean, bigint, or a reference to a class, function, or another program.

An implementation could look like that:

function processor(program: [...any[], TypeOp[]]) {
    const references: any[] = program.slice(0, program.length-1);
    const ops = program[program.length-1[;
    const stack: Type[] = [];

    //we switched from `for of` to a for loop 
    //to be able to jump over an op via i++.
    for (let i = 0; i < ops.length; i++) {
        switch(op){
            case TypeOp.string: 
                 stack.push({kind: 'string'});
                 break;
            case TypeOp.number: 
                 stack.push({kind: 'number'});
                 break;
            case TypeOp.array: 
                 stack.push({kind: 'array', elementType: stack.pop()});
                 break;
            case TypeOp.literal: {
                 //we increment i manually so that the parameter is not used
                 //in the next iterator of the loop (as its not an op)
                 const parameter1 = ops[i++];
                 const literal = references[parameter1];
                 stack.push({kind: 'literal', literal: literal});
                 break;
             }
        }
    }

    return stack[stack.length - 1]; //return last stack entry
}

processor(typeD); //{kind: 'literal', literal: 'abc'}

Stack frames: Classes and unions

For types like object literals, classes, or functions those op bytes arrangements are more verbose and a bit more complex, but follow the same pattern: There will be an op property and class that interact between each other, so that at the end the last op class reads the stack which contains only property types and constructs a type object {kind: 'object', types: properties}.

The array op pops only and always a single item from the stack. A class or an union op can however have an arbitrary amount of members, an arbitrary amount of types it needs to pop from the stack. There are two ways to encode this: Either by giving the class / union op an additional parameter that indicate how many stack entries it should pop or we can introduce stack frames, so that those two ops essentially always pop a whole stack frame. I chose to use stack frames so that previous operators can push an arbitrary amount of types onto the stack without hard coding the amount in the program itself. This makes it somewhat more dynamic.

First introduce new bytecode ops.

enum TypeOp {
    property = 6;
    class = 7;
    union = 8;
    frame = 9;
}

A program for representing a class and union could look like that:

const typeE = ['id', [1, 6, 0, 7]]; //class { id: number; }
//                   ^ [number, property, ref0, class]

const typeF = [[0, 1, 8]]; //string|number
//             ^ [number, string, union]

Then extend the processor to support stack frames and implement the new ops.

function processor(program: [...any[], TypeOp[]]) {
    const references: any[] = program.slice(0, program.length-1);
    const ops = program[program.length-1];

    //we don't have a single stack anymore, but many
    const stacks: Type[][] = [[]];
    let stack: Type[] = stacks[stacks.length-1]; //current stack frame

    for (let i = 0; i < ops.length; i++) {
        switch(op){
            //...
            case TypeOp.property: {
                 //this op has a parameter for its name
                 const parameter1 = ops[i++];
                 const name = references[parameter1];
                 stack.push({kind: 'property', name, type: stack.pop()});
                 break;
            }
            case TypeOp.class: {
                 //pop the whole stack frame.
                 //everything in the current stack is assumed to be a
                 //{kind: 'property'}
                 const types = stack;
                 stacks.pop();
                 stack = stacks[stacks.length-1];
                 stack.push({kind: 'class', types});
                 break;
            }
            case TypeOp.union: {
                 //pop the whole stack frame.
                 const types = stack;
                 stacks.pop();
                 stack = stacks[stacks.length-1];
                 stack.push({kind: 'union', types});
                 break;
            }
            case TypeOp.frame: {
                 //create a new stack frame.
                 //necessary for e.g. unions in the middle of a program
                 stack = [];
                 stacks.push(stack);
                 break;
            }
        }
    }
}

processor(typeE); //{kind: 'class', types: [{kind: 'property', name: 'id', type: {kind: 'number'}}]}

Type functions

Its all easy for primitive and simple types like outlined above. But as soon as type aliases, generics, type functions like conditionals, and mapped types are involved stuff gets much more complex.

When you think about those more complex types and type functions, it becomes obvious that TypeScript evolved to be an actual language on its own with variables, functions, arguments, that is Turing complete. It happens to be the case that the semantics of this language can easily be mapped in a stack machine with a few registers. All TypeScript types and type function can be represented in bytecodes that can run in a processor like shown above. Since TypeScript basically also supports variables and closures it’s necessary to use stack frames. Luckily we already introduced them for class/union. Also, mapped types and a few other types contain sub function calls, it’s necessary to introduce a (function) calling convention into the stack structure.

Inline other types

In order to implement inlining other types (functions) we have to introduce one new op:

enum TypeOp {
    inline = 10;
}

The inline op expects a single parameter pointing to another program.

const a = [[0]]; //type a = string;
//         ^ [string] 

const b = [a, [10, 0]]; //type b = a;
//            ^ [inline, ref0]

The implementation of inline could look like that.

case TypeOp.inline: {
     //this op has a parameter for its program
     const refToProgram = ops[i++];
     const programToInline = references[refToProgram];
     const result = processor(programToInline);
     stack.push(result);
     break;
}

Note that this example uses a recursive implementation: This is not ideal and will lead to stack size exceeded errors with very complex types. It also does not support circular types or circular imports, but it should make it clear how it fundamentally works.

Once generic arguments are involved, it gets more complex pretty quickly. First the type with the generic type parameter has this information encoded in the program using variables/function parameters, which is a reserved stack entry of the current stack frame. The caller makes sure that this stack entry is correctly filled when calling the other type.

Variables

Type aliases with type parameters, mapped types, and infer introduce variables. Variables are slots on the stack with known address. To implement variables, we need to introduce a new operator that loads an address and pushes its value onto the stack. Also to not interfere with the way pop()’ing all types from a stack frames work, we have to make the stack aware of how many variables it contains. We set as convention that variables are placed always at the beginning of the stack frame and when pop()’ing a stack frame, those variables are excluded.

enum TypeOp {
    variable,
    loads, //2 parameters: frameOffset and stack index
}

An generic type alias would look like that:

type MyType<T> = T;

const program = [[variable, loads, 0, 0]];
type MyType<T> = T | string;

const program = [[variable, frame, loads, 0, 0, string, union]];
//`union` pops the whole frame, so we create one for it via `frame`.

The implementation of processor changes:

interface Frame {
    stack: Type[];
    variables: number;
}

function processor(program: [...any[], TypeOp[]]) {
    const references: any[] = program.slice(0, program.length-1);
    const ops = program[program.length-1];

    //we don't have a single stack anymore, but many
    const stacks: Frame[] = [{stack: [], variables: 0}];
    let frame: Frame = stacks[stacks.length-1]; //current stack frame

    for (let i = 0; i < ops.length; i++) {
        switch(op){
            case TypeOp.string: 
                 frame.stack.push({kind: 'string'});
                 break;
            case TypeOp.number: 
                 frame.stack.push({kind: 'number'});
                 break;
            case TypeOp.array: 
                 frame.stack.push({kind: 'array', elementType: stack.pop()});
                 break;
            case TypeOp.literal: {
                 //we increment i manually so that the parameter is not used
                 //in the next iterator of the loop (as its not an op)
                 const parameter1 = ops[i++];
                 const literal = references[parameter1];
                 frame.stack.push({kind: 'literal', literal: literal});
                 break;
             }
            case TypeOp.property: {
                 //this op has a parameter for its name
                 const parameter1 = ops[i++];
                 const name = references[parameter1];
                 frame.stack.push({kind: 'property', name, type: stack.pop()});
                 break;
            }
            case TypeOp.class: {
                 //pop the whole stack frame, excluding variables.
                 //everything in the current stack is assumed to be a
                 //{kind: 'property'}
                 const types = frame.stack.slice(frame.variables);
                 stacks.pop();
                 stack = stacks[stacks.length-1];
                 stack.push({kind: 'class', types});
                 break;
            }
            case TypeOp.union: {
                 //pop the whole stack frame, excluding variables.
                 const types = frame.stack.slice(frame.variables);
                 stacks.pop();
                 stack = stacks[stacks.length-1];
                 stack.push({kind: 'union', types});
                 break;
            }
            case TypeOp.frame: {
                 //create a new stack frame.
                 //necessary for e.g. unions in the middle of a program
                 frame = {stack: [], variables: 0};
                 stacks.push(frame);
                 break;
            }
            case TypeOp.variable: {
                 //initialize with never
                 frame.stack.push({ kind: 'never' });
                 frame.variables++;
                 break;
            }
            case TypeOp.loads: {
                 //this op has two parameters.
                 //the offset to the stack frame from where to load a value
                 const frameOffset = ops[i++];
                 //and the index of the stack entry of that frame
                 const stackIndex = ops[i++];
                 const frameToReadFrom = stacks[stacks.length - frameOffset];
                 const value = frameToReadFrom.stack[stackIndex];
                 frame.stack.push(value);
                 break;
            }
        }
    }
}

To support passing arguments to an program with variables, the variable implementation could check an args array and read from it instead of setting it to {kind: 'never'}.

function processor(program: [...any[], TypeOp[]], args: Type[] = []) {

//...

case TypeOp.variable: {
    frame.stack.push(args[frame.variables] || { kind: 'never' });

//then call it via
processor(program, [{kind: 'string']);

The type compiler knows where to find a variable identifier and can correctly calculate loads call parameters.

Calling convention

A few type functions require to call sub functions of a program: mapped types, distributive conditional types, and conditional types. In JavaScript one could say they are blocks, but in this bytecode interpreter those blocks are compiled as sub functions.

type A<T> = {[P in keyof T]: T[P]}
//                           ^ T[P] is a function, called multiple types
//                           with 2 variables available in closure.

type B<T> = T extends any ? true : false;
//          ^ naked T means distributive conditional type
//            which turns it into a function called multiple times
//            for each T member (when T is an union)

type C<T> = string extends T ? true : false;
//                                    ^ false block is a function
//                             ^ true block is a function
//                             Those blocks are only executed depending on the 
//                             result of the `extends` operator.

To make this work, we need 3 new ops.

enum TypeOp {
    call,
    return,
    jump
}

Since the type functions above have sub functions with closure (function in a function), those sub functions are embedded in the current program at the very beginning. jump with one parameter makes sure that the embedded sub functions are omitted initially, return marks the end of each embedded function, and call with one parameter (position of the sub function) triggers the calling convention (usually done internally and not emitted as op code)

type Map<T> = {[P in keyof T]: T[P]};

Before we can compile this type function in bytecodes, we need a few new ops:

enum TypeOp {
    keyof, //assumes/pops one entry from the stack
    indexAccess, //assumes/pops two entries from the stack

    //has one parameter: the ops address of the sub program
    //assumes pops one entry from the stack
    mappedType,
}
const program = [[
    //jump directly to `variable`, where the actual program begins
    jump, 10,

    //the sub function `T[P]`. 
    //calling convention creates for each call a new stack frame,
    //hence frameOffset parameter for loads is 1.
    loads, 1, 0, loads, 1, 1, indexAccess, return, 

    //main program start
    variable, variable, loads, 0, 1, keyof, mappedType, 2
]];

The mappedType pops an entry from the stack created by the keyof op and tries to loop over it. The loop calls for each entry of keyof the sub function at 2. The calling convention says that a “call” (to 2 in this case) creates a new stack frame and puts in its first stack entry the return address. Once return is called, the return address is read and the processor jumps to that address again. The mappedType op uses a registry to track at which position it currently is in order to know if another call to 2 is necessary or if the program can continue. It modifies for each iteration the second variable so that the sub function has the correct value when doing loads 1 1.

The implementation of all of these ops are a bit more complex and probably out of scope for this post. To implement the whole TypeScript type system there are a lot more ops required. Currently the instruction set contains over 81 ops. If the TypeScript teams decides that this approach is worth considering I’d love to document each and every op with example code like I did above.

Attach type information

Now after defining the representation of a type in runtime code, the next problem to solve is how to attach those information to actual JavaScript symbols like classes and functions.

The idea is that for classes and functions I get the full type information when the symbol is passed to a function.

function a(param: string): void {}

const type = reflect(a);
//{kind: 'function', parameters: [{kind: 'parameter', name: 'param', type: {kind: 'string}], return: {kind: 'void'}

To make this possible its necessary to attach the bytecode program to each function and class. I decided to choose to write it at a property called __type. So that emitted JavaScript could look like that:

function a() {}
a.__type = [[string, void, function]]; //correctly encoded though, e.g. [0, ...]

class User {}
User.__type = [[...]];

Interfaces and type aliases could be emitted with their name prefixed with something unique.

type MyType<T> = T | string;
const __ΩMyType = [[variable, loads, 0, 0, string];

This approach has as advantage that no global WeakMap is required and the program is GC automatically when the function or class is not referenced anymore. This however is limited to function and classes only. It’s not possible with this approach to annotate type information to object literal expressions or variables. Exporting type aliases and interfaces as const variables makes it possible to be tree-shakeable.

Why

Why is a bytecode interpreter proposed and not something more trivial like serializing a type as JavaScript objects? One can easily emit JavaScript objects that describe a type directly instead of introducing bytecode ops and an interpreter. For example:

const typeA = {kind: 'string'};

The reason I chose a bytecode representation is that the emitted JavaScript code is much smaller and allows dynamic type computation. The latter is not possible when the final computed type is serialized. Resolving generic type arguments also require some sort of runtime type computation and type checks in order to narrow a generic union down to the real passed argument. Also, serializing a complex class with many properties and all its attributes (visibility, readonly, initializer, optional) makes the emitted serialized objects huge and the bundle size too large. They also impose a greater runtime overhead cost in terms of execution and memory footprint. My goal was instead to design a format that has the fewest overhead cost possible and generate load only when the type is actually requested.

Encoding

The program bytes are encoded in printable ASCII characters, from the code range 33 upwards.

That means the program array structure from above

const typeA = ['a', [5, 0]]; //'a'
//                  ^ [literal, ref0]

Is actually encoded in JavaScript like that:

const typeA = ['a', '&!'];
//                   |^ String.fromCharCode(0 + 33) = '!'
//                   ^ String.fromCharCode(5 + 33) = '&'

This makes a complex type much smaller compared to the array number representation.

Type checking / Performance Improvements

The bytecode interpreter as currently designed and implemented only allows static types to be compiled. That means no inferred types or control analysis is implemented. Since it supports the extends type operator its able to check whether two types are compatible. This is currently limited to extends only and has not implemented more detailed type checks like all type variances.

However, by using such an architecture it would probably be possible compiling a whole TypeScript file into bytecodes and execute this in a very fast VM. That VM can be written in faster languages like C++/Rust and operate entirely on those bytescodes. By implementing error collection it would be further possible to collection validation errors in this VM which basically converts it into a very fast type checker.

This approach of compile a TypeScript file into bytecode, putting it as a binary file on disk, and run it on demand in an ultra fast VM enables probably much faster type checking and simplifies caching.

I say here “probably” because I’ve not tested it yet. If the TypeScript teams says its worth investing further I’d love to build a PoC for that. I’d implement the VM in C++ and convert the type compiler from a transformer to a standalone TS program that is based on the TS compiler AST.

Beyond the Stars

A bytecode interpreter, or rather an bytecode compiler and official spec of an instruction set for a virtual machine, could allow new ways of using TypeScript in non-JavaScript environments. For example could it be possible to define types in TypeScript and use them in cross-language environments for e.g. serialization and validation, equal to what JSON-Schema currently is used for. Concretely allows that to define types in TypeScript (a subset of TypeScript, essentially only the type system, not the JavaScript semantics) and have those types as single source of truth sitting in a .ts file. Other languages like go, C++, Rust, can read its bytecode and compute the type in their runtime allowing them to work with those types to e.g. create serialization or validation functions.

It would enable writing ORMs entities, JSON schema, ProtoBuffers, and many other DSL in TypeScript without having the full fledged TypeScript library or JavaScript as dependency.

One could go as far and argue that this instruction set and bytecode interpreter could be standardised and implemented in the JavaScript engines like v8 directly. Then based on that further ECMAScript features can be implemented (like overloaded functions, function parameter validation, and more). I think TypeScript is here to stay and making JavaScript itself more powerful on successful patterns TypeScript introduced is something the whole industry would appreciate.

Implementation Proof of Concept

I implemented most of the outlined use cases already in a library called Deepkit. It contains @deepkit/type for serialization/validation/type guards, @deepkit/orm for an ORM that supports classes and interfaces as entities, @deepkit/rpc that supports an fully type-safe RPC implementation with automatic serialization/validation, @deepkit/injector a dependency injection library that supports auto-wire, interfaces as dependencies, and more.

The current released version of these libraries use a custom type system based on classes with decorators, however I’ve rewritten and implemented almost all libraries onto the new runtime type system in the branch feature/autotype that works like described above. The core functionality is in this folder: https://github.com/deepkit/deepkit-framework/tree/feature/autotype/packages/type/src/reflection

It contains the type compiler (that generates the bytecode from the AST in a TypeScript transformer), the processor (the stack machine that processes the bytecode and generates types), as well as some handy reflection classes and type utility functions.

Some important characteristics of this implementation:

  1. The bytecode is based on the AST, not inferred types. That means it generates bytecode that essentially mirrors actual type functions. For example a mapped type is compiled as actual mapped type, as a function with inputs and a loop. Not the actual type result is encoded in bytecode, but the type function itself. This makes it necessary to annotate types explicitly. It could be changed to support inferred types as well in the future. However, it’s not possible to reverse the bytecode back to the exact same TypeScript source.
  2. The processor is a stack based virtual machine that supports circular types and uses an iterative interpreter instead of a recursive one, which makes it possible to have arbitrary complex types without running into a “Maximum call stack size exceeded” error.
  3. Circular types are supported perfectly well. The resulting type object is then circular as well.
  4. The implementation does not support yet everything: For example renaming properties in a mapped type via as is not supported yet. Also there are a few differences and not all types are exactly what TypeScript would emit, see for example: https://github.com/microsoft/TypeScript/issues/47048.

API

To get the runtime type information from a type, a function call is necessary. To arbitrary types typeOf<T>() can be used. For classes and functions there are two ways to get type information: Raw type objects and handy Reflection classes:

const type = typeOf<string>();
const type = typeOf<number>();
const type = typeOf<string|boolean>();

type a = {a: string, b: number};
const type = typeOf<a>();

class User {
   id: number = 0;
}

const type = reflect(User);
//{kind: 'class', classType: User, types: [...]}

const reflection = ReflectionClass.from(User);
reflection.getProperties();
reflection.getMethods();

The various Reflection* classes are inspired by other reflection APIs like the one of PHP. The code of those classes can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/reflection.ts

Compiler

The compiler can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/compiler.ts

It’s a rather complex transformer that needs access to various internal types from the TypeScript compiler API in order to get all necessary information.

All bytecode ops can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/type.ts#L1976-L2141

Processor

The processor can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/processor.ts#L248

Caching

Types in this implementation are cached. The cache is attached to the bytecode array directly and only set when the program is not generic.

Type types

Type object information are simple objects that have all type information in a raw format. There is a big enum contain all ReflectionKind and for each type an interface that has a kind, additional type specific properties, and parent + other runtime information like type annotations.

export const enum ReflectionKind {
    never,
    any,
    unknown,
    void,
    object,
    string,
    number,
    boolean,
    symbol,
    bigint,
    ...
}

export interface TypeNever {
    kind: ReflectionKind.never,
    parent?: Type;
}

See https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/type.ts#L21-L452

Reflection

Reflection is the heart of the implementation. Its API and tests can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/integration.spec.ts

Type decorators

In order to be able to implement certain functionalities like ORM and validation its necessary to annotate types with additional meta-data. For validation its validation constraints like minimum, maximum, minLength, etc. For ORM its meta-data like primaryKey, autoIncrement, reference, index, etc. TypeScript already supports branded types which can not directly be used here. Instead it uses a pattern almost identical to branded types but making the brand optional.

type PrimaryKey = {__meta?: ['primaryKey']};

class User {
    id: number & PrimaryKey;
}

In the reflection system, intersections with objects that contain a __meta property are handled in a special way. They expressions are stored at the type directly.

type a = number & PrimaryKey;

const type = typeOf<a>();
{
  kind: 'number',
  decorators: [
    {
      kind: 30,
      typeName: 'PrimaryKey',
      types: [{
        kind: 32,
        type: { kind: 25, types: [Array], parent: [Circular *1] },
        name: '__meta',
        optional: true,
      }],
    }
  ]
}

With those information at the type directly it’s possible to change the behaviour of certain types in an extendable way.

Serialization

Serialization examples can be seen in the tests: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/serializer.spec.ts

The Serializer API itself is highly customizeable and the default serializer can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/serializer.ts#L1643

expect(serialize<Date>(new Date('2021-10-19T00:22:58.257Z'))).toEqual('2021-10-19T00:22:58.257Z');

Casts

Casts is nothing more than running the deserializer of the default serializer. It supports converting types that are convertable and throws if its not convertable.

https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/serializer.spec.ts#L210

expect(cast<string>('123')).toBe('123');
expect(cast<string>(123)).toBe('123');

expect(cast<number>(123)).toBe(123);
expect(cast<number>('123')).toBe(123);

expect(cast<Date>('2021-10-19T00:22:58.257Z')).toEqual(new Date('2021-10-19T00:22:58.257Z'));

Type guards

Type guards are an inherent part of serializers since union types need a type check in order to know which serializer function should be used. Its API and tests can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/typeguard.spec.ts

expect(is<number>('a')).toEqual(false);
expect(is<number>(123)).toEqual(true);
expect(is<number>(true)).toEqual(false);
expect(is<number>({})).toEqual(false);

Validation

Type guards are one part of validation. Content validation another. API and tests can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/validation.spec.ts

type Username = string & MinLength<3>;

expect(is<Username>('abc')).toBe(true);
expect(is<Username>('ab')).toBe(false);

Summary

I think this post is already too big, so I stop here, although there can be a lot more said about the bytecode interpreter, runtime types, and its possibilities.

If the TypeScript team considers this worth investigating further, I’d love to help to integrate this into TypeScript itself and make it a reality for all those users downloading solutions for that million times each month.

I understand if this feature is not in TypeScript's interest (as outline in its goals), however I think considering the shown demand in the industry and further described opportunities it might be worth either rethinking about the goals or at least support this undertaking with external partners/projects. Even if it will not be part of TypeScript officially it would help enormously if the Typescript team accepts this use case as valid and provide certain features (like making certain compiler APIs public or allowing transformers in tsconfig.json) and supports it in a way that it can be implemented in the best way possible.

MartinJohns commented 2 years ago

That is an impressive write-up, but I have to question the purpose of opening yet another issue about a feature that has been explicitly rejected again and again by the TypeScript team.

marcj commented 2 years ago

but I have to question the purpose of opening yet another issue about a feature that has been explicitly rejected again and again by the TypeScript team.

As you can see the main discussion issue https://github.com/microsoft/TypeScript/issues/3628 has never been rejected and is still open, just like many other feature requests that although never confirmed neither rejected are still open. Also, this issue is not about a vague feature request, but a concrete implementation and analysis of the market demand plus a proposal for a bytecode interpreter which can be used for runtime types but also for other scenarios like performance improvements as well. So, quite different to anything seen yet, in terms of elaborateness, analysis, design proposal, and implementation demonstration. To even remotely compare this to any other issue regarding runtime types is mystifying to me.

jcalz commented 2 years ago

This is awesome. However, it is worth noting that there is an explicit language design non-goal to:

5. Add or rely on run-time type information in programs, or emit different code based on the results of the type system. Instead, encourage programming patterns that do not require run-time metadata.

I'm not on the TS team and am not the one making decisions about whether to leave issues open or close them, but if I had to wager on it I'd bet that this will be declined quickly, no matter how amazing it is. We'll see, I guess.

marcj commented 2 years ago

@jcalz yes, indeed. I've written about that in the original post, and at the end. Getting a stance on that defines if and how this will be made available to the users. Doesn't have to be necessarily in TypeScript's core, its good enough if the TypeScript team decides to support the undertaking in other ways.

I think it has already shown that this non-goal is not realistic. Maybe at the beginning 10 years ago it was plausible for JavaScript, but not today. At the end the user/market decides how a language will be used, and it (see Demand section in OP) clearly showed that type information are used in runtime and a lot of people and projects use and rely on it in all sorts of programs.

jcalz commented 2 years ago

Sorry, I don't see where you acknowledged the explicit non-goal that essentially prohibits this endeavor in TS proper. Looks like you mentioned goals 3, 4, and (checks notes) 9, but not non-goal #​5. I might have missed it though.

marcj commented 2 years ago

That's fine. I'm more interested in what the TypeScript team is saying about that and especially about the last paragraph of the original post. It's not a must-have to have this in TypeScript itself. However, there's not much value in sticking to years old goals that might be meanwhile obsolete due to market movements. So let's concentrate on making the use cases of packages that have 76mio downloads/month possible in better ways instead of finding ways against it and why it shouldn't even be considered in the first place due to abstract nice-to-have goals.

RyanCavanaugh commented 2 years ago

First off, this was a super fun and interesting read, so thank you for posting it. It's great to see people exploring what's possible, and this looks like a very promising project.

To cut straight to the point: Is this in scope for us, now? Still no. The absolute plethora of options in this space, both for native-to-TS options like io-ts and 3rd-party-to-TS like JSON schema validators, shows that there's a wide range of what people want in the space, and that many projects with different approaches are capable of delivering tools with value.

Regarding design goals - we're extremely committed to the erasability of the type system. Keeping the type system erasable is a very strong invariant that allows us to do other important things. One of the reasons we're able to make the type system more powerful and expressive from version to version is that we have the flexibility to e.g. change type inference at a particular position from { x: string } & { y: number } to { x: string; y: number } without worrying about breaking the runtime behavior of anyone's programs.

This isn't a theoretical concern: people already @ us about when we change declaration emit in ways that produce representationally-distinguishable but semantically-identical types. Taking this to the next level wherein one person's bug about a conditional type not immediately resolving becomes a breaking change in someone else's program, at runtime, in ways that are going to be much more subtle than type error vs not-a-type-error -- it's not a Pandora's Box we're keen to open.

jcalz commented 2 years ago

@RyanCavanaugh note the last paragraph of the original post:

Even if it will not be part of TypeScript officially it would help enormously if the Typescript team accepts this use case as valid and provide certain features (like making certain compiler APIs public or allowing transformers in tsconfig.json) and supports it in a way that it can be implemented in the best way possible.

Would you recommend that @marcj flesh out these features into concrete suggestions and raise them in new issues (assuming such issues do not already exist)? Or are those also out of scope?

Rush commented 2 years ago

like making certain compiler APIs public or allowing transformers in tsconfig.json

@RyanCavanaugh I'd also like to highlight this ask. While "runtime types" indeed seems like a risky and "Out of Scope" feature to add to the default typescript build, why not let experiments happen outside the typescript tree? The community needs just a little bit of help. The upside is that a lot of useful innovation will happen that is otherwise not possible, and sometimes even discouraged today.

RyanCavanaugh commented 2 years ago

We do take API requests through issues and have added a fair number of functions to the public API surface upon being asked. Emit plugin support is also available and we'd take issues on things there if things are needed.

We don't support transform plugins in tsconfig.json because we think this operation should not be a risky one:

git clone somerepo
npm install --ignoreScripts
tsc
Rush commented 2 years ago

Why not support tsc --someRiskyOption so that people can customize their npm run build / yarn build? Also many of us use ts-loader and other build systems to invoke typescript. Transform plugins could be exposed via the programmatic API only, leaving tsc "safe by default".

Hookyns commented 2 years ago

I have to mention my project (not listed in the table above) tst-reflect.

I'm missing reflection in TypeScript(/JavaScript) because of Dependency Injection. TypeScript will never implement reflection, sorry to all of us, I'm just realist. So I've created my own quite huge reflection system. And then I've build the Dependency Injection of top of that, see.

tst-reflect is based on custom transformer and it generates metadata about types, accessible by getType<SomeType>(): Type function. Instance of Type class is returned which contains quite a lot of methods to work with types.

Generic types supported! It works with quite complex types, unions, intersections, enums, details about interfaces and classes (constructors, properties, methods, parameters, generic types, decorators,.. ) and much more.

Usage is inspired by C# reflection.

Here is REPL with example. Runtime Type, even runtime generic

Simplified version of that REPL.


function printClassInfo<TType>()
{
const type = getType<TType>(); // <<== Here is the generic type used!
if (!type.isClass())
{
    return;
}

console.log("class " + type.name);
console.log("full type identifier: " + type.fullName);

const properties = type.getProperties();
const methods = type.getMethods();

console.log("Properties");
console.log(
    properties.map(prop => 
        `${AccessModifier[prop.accessModifier]} ${Accessor[prop.accessor]} ${prop.name}: ${prop.type.name}
    ).join("\n")
);

console.log("Methods");
console.log(
    methods.map(method => AccessModifier[method.accessModifier] + " " + method.name
        + "("
        + method.getParameters().map(param => param.name + ":" + param.type.name).join(", ")
        + "): " + method.returnType.name
        + (method.optional ? " [optional]" : "")
    ).join("\n")
);

}

class Bar { foo: string; bar: any; toSomething(): void; }

printClassInfo();

Kinrany commented 2 years ago

Deserialization is the one thing that absolutely requires some runtime representation of compile-time types.

There are many different approaches to deserialization, but they all have the same general shape: type Parser<T, U, E> = (x: T) => ['ok', U] | ['error', E].

Perhaps having a simple standard type like this would help.

An advanced proposal could also include an API for composing these types: e.g. a standard operation for constructing a Parser<T, [U1, U2], E> out of Parser<T, U1, E> and Parser<T, U2, E>.

marcj commented 2 years ago

Ok, cool, was worth a try. Since runtime type information is out of scope and the others points weren't addressed, I assume there is no interest in any of this, so I'm going to close this one. Thanks for the fast feedback.

marcj commented 2 years ago

FYI, this TypeScript bytecode interpreter has now been released together with a full-featured framework that tries to utilise its potential to the fullest: https://deepkit.io/blog/introducing-deepkit-framework

alonesuperman commented 2 years ago

Mark

marcj commented 1 year ago

I say here “probably” because I’ve not tested it yet.

For anyone reading this: This has been tested last year and a proof of concept high-performance type-checker has been built using the byte-code approach outlined in this issue which confirms this theory actually works. It can be found here: https://github.com/marcj/TypeRunner. It shows how it can speed up type checking by many hundreds to thousands of times.

I can not build it alone as it's too time-consuming. But if we could make it happen it would not only shape the future of TypeScript itself but the whole industry, bringing so many architectures to the next level, allowing debugging of types interactively, increase the Interoperability of many languages (replacing JSON schema with TS), and rendering so many transpilers (esbuild, SWC) obsolete.

We have meanwhile dozens of JS packages that make type information available in runtime in all sorts of bizarre ways. Together they have a staggering 180 million installs per month! Much more than TypeScript itself. I think it couldn't be more clear that people really want this.

sinclairzx81 commented 1 year ago

@marcj Hey, still subbed to this thread. TypeRunner looks amazing. Do you plan on publishing a specification for the bytecode?

marcj commented 7 months ago

@sinclairzx81 not planned to make one, although I have a subset more or less standardised since they are used in backward compatible ways in Deepkit's runtime types since now two years. What are you looking to solve/get with a specification of a bytecode that is capable of doing full type checking like done in TypeRunner? Please note that TypeRunner has also super-instructions for optimisation purposes, that are likely not needed in an official spec. It was also designed for a stack machine. Bytecode for a register machines would look very different, and it's not yet clear to me what type of machine would be suited best for something like TypeScript, or what abstraction of bytecode. Maybe a more low level bytecode is better that can then be JIT optimised depending on its use.

M-jerez commented 6 months ago

Hi All, so just banging my head around this issue.

It is really a shame that we don't have reflection when using typescript. so much effort put into strongly and correctly type everything and then all that info is lost at runtime 🥲.

Anyway... from what I can see the biggest issue for the TS team point of view is this one:

Regarding design goals - we're extremely committed to the erasability of the type system. Keeping the type system erasable is a very strong invariant that allows us to do other important things.

Now, here's my idea for a simple API that could tackle this issue. It's straightforward and, I think, not too tricky to implement.

import { reflect } from 'tsc';

interface User { 
  name: string; 
  age: number; 
  address?: string;
}

const user: User = { name: 'John', age: 33 };

// 'reflect' is a special function provided by 'tsc' for runtime reflection, the js output would be an object literal instead a function call.
const userMetadata = reflect<typeof user>();

console.log('userMetadata.kind); // Outputs: 'interface' (or a similar kind identifier)

This way, we're not messing with TypeScript's design goals of keeping type information erasable, as const user is still part of the program.

For simplicity the output could be a serializable version of the AST with all nodes that are not related to types removed (or something similar) and all type dependencies embedded (not ideal from a memory point of view but at least we would have reflection). This way there would be NO need to emit anything beyond the user const assignation with an object literal. From there, us developers can handle the emitted AST and make it something more useful.

You could argue that the reflect is not really a Function but that could be solved by providing a type for it that does not allow anything other than calling it.

typescript compiled output

const user = { name: 'John', age: 33 };

const userMetadata = {
   kind: X, 
   flags: [Z, Y], 
   parent: null, // or other node if there are type dependencies
   children: [
     name: { kind: X, flags: XWZ, parent: userMetadata}
     age: .... 
     address: .....
   ]
};

console.log('userMetadata.kind); // ....

Anyway just my two cents fro an elegant solution to the problem. Unfortunately, I'm not a tsc compiler guru, so I can't whip up a pull request right now.