Typescript codegen - Githubissues

jaredponn commented 10 months ago

I'm so sorry for the large PR. If it's easier, we can book a meeting to chat about things. :P

What was done:

Nix ops (similar to Haskell) were implemented
Haskell code generator for typescript code
Extremely trivial sample projects to demonstrate "things work". Prelude sample project, Plutus sample project
Changed TS projects (runtime systems) to use flake-lang.nix.
Moved TS prelude and TS plutus ledger api to separate repos

Limitations:

Only Haskell2010 style type classes work (probably). So

instance C (f a) where ..
instance C (T a a a) where ..
instance C (T A B a) where ..
instance C (T (a b) a) where ..
instance C a where ..

will all most likely cause strange bugs / errors in the generated Typescript code.

Typeclasses (probably) need to have distinct names (even if they are in different modules).

Code icks that work for now, but maybe another day I'll resolve them:

Generated Haskell code disables all typescript checks because I was having trouble convincing TS that the generated code is exhaustive. I think if I had the complete type information of every node in ValueE, I'd then be able to completely annotate all terms in the generated TS code, and it'd probably convince TS that the generated code is okay.

High level idea for type classes

Given a typeclass C, and a type Ty, we assume that the following exists in the *-runtime library:

// the dictionary for the class C's methods
export type C<A> = { myMethod: (arg : A) => A, .. }
 // A "global" dictionary which contains all of C's instances
export const C : CInstances = { }
// The type of the global dictionary (we start with nothing in it)
interface CInstances {
}

Then, when generating code for Ty, we must generate code like

export type Ty<A> = ... // some reasonable type defn.
export const Ty : unique symbol = Symbol('Ty') // This creates a globally unique symbol bound to the variable `Ty`

See Symbol for details on the globally unique name.

And finally, when generating the instance C (Ty a), we generate code like

// Code to "extend" the type `CInstances` to include Ty as an index mapping to value`C<Ty<A>>` 
// via declaration merging.

// the actual implementation
C[Ty] = function <A>(dictA : C<A>) {
  return {
    myMethod: (a) => { ... } 
   // implement the instance for C
   }
}

See Module Augmentation in Declaration Merging for details on declaration merging.

Finally, for end users, to use the typeclass instance, they would type

C[Ty](Int).myMethod( ... )

i.e., C[Ty](Int) calls the instance for the class C with Ty instantiated with the type Int (which we assume Int is a unique symbol similar how Ty has a unique symbol)

Uncurried functions

I think in TS, it's a bit weird to have functions return functions e.g.

f(a)(b)

is a bit strange, and it's more natural (in my opinion!) to write

f(a,b)

The generated code prefers the latter i.e., the code generator uncurries functions.

jaredponn commented 8 months ago

Alright, some comments for why I didn't just refer to the upstream versions of the runtime types.

tldr: not referring to the upstream types maintains an invariant in the code generator.

I mention this a bit in High level idea for type classes at the top of this PR

To start, we maintain the invariant

Let M be a module. Then, type T is defined in M iff a value T (a unique symbol) is defined in M

Why do we want this invariant?

It lets us consistently choose which instance of a type class we want. For example, suppose we have a typeclass ModuleWithC.C and we further assume there will be a "global dictionary" (defined in the same module) named ModuleWithC.C (note the same name C). When we want the instance ModuleWithC.C ModuleWithT.T, we then may type (provided the instance is already defined)

ModuleWithC.C[ModuleWithT.T]

This reasoning also applies to the code generator -- this invariant provides a nice and consistent way to find the instances corresponding to a type class and a type.

So, if the LambdaBuffers code generator assumes this invariant of all types / instances it generates, all that remains is to ensure that this invariant is also satisified by the runtime system. Which is exactly why there are so many modules re-exporting the types upstream -- to satisfy the invariant expected by the code generator.

If I did refer to the upstream type, then the invariant wouldn't be satisified, and when the codegenerator tries to refer to the value corresponding to a type T, then the value won't exist and the code generator will generate invalid code (recall that the prelude-typescript and plutus-ledger-api-typescript libraries don't have the value corresponding to types as they have a much more simplistic approach to typeclasses that disregards LambdaBuffers' module system)

Hopefully this makes a bit of sense for why it is the way it is? I'm happy to hear any suggestions though!

jaredponn commented 8 months ago

Alright, thanks for all the help @bladyjoker . Merging this, and will write tests soon!

mlabs-haskell / lambda-buffers

Typescript codegen #161

High level idea for type classes

Uncurried functions