racket / rhombus

Rhombus programming language

Other

352 stars 62 forks source link

My Wish-list #136

Closed sirinath closed 3 months ago

sirinath commented 4 years ago

Wishlist for Meta Programming and Language-Oriented Programming

Language features
- The core of the language should comprise of a transformation system and proof system which can provably covert, reduce or transform a given representation of binery or textual strings or graphs into other representations in the most efficient and effective way with minimal phases and rescans. This core system will be used for lexing, parsing, interpreting, JIT and AOT compiling. This will serve as the minimal core.
- Each phase will use the same DSL which would be a set of transformation rules and proofs to make the system simple and easy to learn.
- The DSL will be minimal but should be extensible by creating more complicated reusable libraries.
- Lexing and parsing would apply a set of rules to a input string of binery or text data which will be converted into a graph which represent the AST. Optimisation would be applying another set of rules to rewrite the AST into a optimised representation. This will further rewritten to generate machine code. The same set of rules can be used to reverse engineer a program. In reality the rule set should be reducible in which the lexer, parser and interpreter, compiler and/or VM runtime will be generated which can convert a given input to the final form as well as the means to reverse engineer. If the information lost in reduction like variable names an comments are written into metadata, perfect recreation of the source should be possible from the output.
- If an interpreter or VM runtime is generated it will apply the relevant subset of rules, reductions or transformations for the language or intermediate representation with JITing. The optimisation and JITing will be by applying the relevant rules and reverse optimisation would be reversing applied rules using stored metadata on information lost on optimisation or generating machine code. In generating the interpreter or VM runtime rules will be fused and also reduced to result in the most efficient interpreter or VM runtime.
- The transformation rules and proofs can be defined at the language level and partly shared between related languages. Also, certain libraries in the language can also define their transformation rules and proofs for library specific optimisation which also can be reused and repurposed where applicable. There can be multiple intermediate formats defined as part of the standard library so that reduction rules can be shared. The standard intermediate formats would include one format to represent AST and another to represent a byte code format, though the user has to full option to define their or even extend these formats using the language definition DSL.
- Since there are many string encodings and standards like Unicode is evolving the language should be able to support any character encoding including future version. Hence there should not be any native character encoding format but be open to any existing or feature character representation schemes.
- The language should have a modular extensible type system where the assumptions type system regime in one language is guaranteed when crossing language boundaries. This will facilitate cross language development without or with minimal runtime overhead.
- Have phase fusion for optimality. When passing and generating AST optimisation and code generation can begin in such a way that the absolute minimal number of compiler phases or rescanning the generated data structures are needed to generate output also while retaining the minimum about of data in memory. Tree and graph data structures may sometimes be inefficient so these may be encoded as other data structures whilst preserving abstraction for performance.
- Will ensure the internal data structures are the most optimal and even transform/rewrite and eliminate intermediate data structures so that the resulting the data structures are the most optimal representations with the same effects for the given set of equivalate, transformation and optimisation rules for data structures. Similarly the data structure traversals will be fully optimised.
- API and Constructs using syntax of a given language can have mapped directly to intermediate code or machine code or implementation in another language without FFIs.
- The core language infrastructure should have applications including but no limited to:
- Programming languages
- Hardware definition languages
- Graphical rendering and multi media definition languages
- Print formatting and type setting languages
- Design and modelling languages
- Mark-up languages
- Codecs
- Protocols
- File format definitions
- etc.
- Ideally the proof checker, symbolic execution engine and reduction engine should be the same.
- Error tracking and reasoning:
- Numerical:
  - Rounding error.
  - Overflow/underflow errors.
- Range errors:
  - Array out of bounds.
- Eager and lazy evaluation languages can be defined.
Lexer/decoder
- The language defining and processing infrastructure should be universal, hence the code file may not necessarily be encoded in a predefined text format though ASCII and Unicode are considered universal standards.
- The lexer/decoder should be definable thought a DSL which would make defining the language very easy.
- The lexer/decoder should be able to able to handle both binery and text data so that it can parse any encoding.
- The lexer/decoder should be able to act as a decoder for binery formats and protocols to analyse distributed code and data transfers.
- The lexer/decoder should be able to handle different machine code, bytecode, bit-code, intermediate-code formats to reverse engineer and also deoptimize and reoptimize.
Parser
- Generate both incremental and conventional parsers from DSL.
- Support both text and binery strings as input.
- This will be fused with other phases.
Optimisation
- Optimisations define through transformations at the language level.
- Optimisations can be saved and imported from libraries.
- Optimisations can defined restricted to certain files, libraries or certain contexts.
- This will be fused with other phases such that the activity happens in parallel and there is minimal traversal of data structure, i.e., each data items is touched the minimum amount of time in the most optimal order.
- Support of super optimisations.
- Whole program or local optimisation depending on context. If locally optimised, whole program aware staged JIT compilation in programs with full code coverage and same path executed multiple times.
- Mnemonization of frequently evaluated code.
- In absence of non determination due to IO, compile time fully evaluated code.
- Save and reuse previous JIT compilations information.
Code Generation
- This will be fused with other phases such that the activity happens in parallel and there is minimal traversal of data structure, i.e., each data items is touched the minimum amount of time in the most optimal order.
Macro/Metaprogramming System
- The language will be minimal which can support
- Definition of parsing rules on string or both binery and text
- Definition of syntax
- Definition of type system
- Definition of proofs, checks and validation to ensure language and code soundness and correctness. I.e. the language can be checked to be sound and correct as well as programs written in the language can also be checked to be sound and correct.
- Define transformation. E.g. Code <--> AST <--> Intermediate or machine code.
- Define optimisations which can be language level or library or more granular level.
- Define AST interpreters and VMs.
Interpreting
- Interpretation will be achieved through DSL defined transformations which will define parsing, optimisation, code generation, JIT.
- Interpreter can run separately as a server which each interpreter delegates to so that the code translation cost for the interpreter is less.
- While using actuals values for interpretation can run parallelly on abstract values for better code generation.
- Analyse effects on the stack and heap and recreate then short circuiting code evaluation.
- Partial evaluation of code.
- Meta information lost in transformations is saved when needed for deoptimization
VM Runtime
- This is very much similar to what was mentioned for interpreters above.
Static Analysis
- Symbolic and abstract execution.
- Proof checking.
- String, graph and symbol/abstract transformations.
- Minimise numerical errors and loss precision.
Type System and Type Checking
- Type systems can be definable.
- Defined type system is checked with and type checking engine.
- This also will use JIT.
- Other optimisation techniques for fast type checking.
- Can start checking when complete code is not available incrementally filling in the missing pieces as more code becomes available.
- If feasible will use the same engine for proof checking and perhaps symbolic and abstract execution if feasible.
- Dynamic, gradual and static type checking paradigms will be supported.
- Will support type level computations
Patten Matching
Program Verification an Validation

Wish List for General Purpose Programming Language

Language Domains
- Language should be usable in:
- As standard features
  - General purpose programming
  - Embedded systems
  - Highly reliable systems
  - Array programming
- As extended, internal DSL or library features
  - Graphics and multimedia
  - Web development
  - VR, Games and UI development
  - Typesetting, typography and typesetting
  - Hardware definition
  - Parallel and accelerated computing
  - Hardware and software co-designed systems
  - Mathematical, scientific and statistical programming
  - Protocols, RPC definitions
  - Codec creation
  - File and data format definitions and parsing
Language features
- The language will have
- Types
- Functions
- Modules
- Symbols
- First class scope and environments
- Tags and annotations
  - This will be used to:
  - Extend language with new features:
    - Companion objects
  - Extend type system
    - Companion objects
    - Fine grain access modifiers
- Type system
- Extensible type system - annotation processing which can add extra type checking without conflicting with the language type checker
- Strongly typed by default
- Whole program type inference
- Nominative typing - with special notation e.g. type starts with @ to denote nominative.
- Structural typing
- Dependent typing
- Refinement typing
- Flow sensitive typing
- Explicit latent type assertions
- Explicit manifest type assertions - also used for type narrowing
- Sub-structural and uniqueness type checking
- Effect system
Wishlist of Key Features

Exact syntax and mechanics how these features are to be implemented is yet to be determined. The syntax discussed here is suggestive and not concreate.

Minimum Key Words

The number of key words will be limited. E.g. tp - type or shorthand ‘, fn - function, tm - term which is optional for variable definition Scopes take the name of the object and function. Modules also can be mimicked by singleton objects.

Evaluation Strategy

Evaluation strategy is lazy.

Nominative and Structural Types

Both nominative and structural types will be available. There will be:
- The tuples have nominative lable.
- Fields can be nominative lable.
E.g. In tp @Person(@age: int, @name: str) the @ symbol denotes nominative type. If there is no @ symbol, @Point(x: int, y: int), then lable is not used to type check as in the case of x and y where any int can occupy the these element in type checking.

If a nominative lable is present this is considered a subtype over a type without a nominative lable. E.g. tp @A(@a: X) <: tp A(@a: X) <: tp A(a: X) and tp @A(@a: X) <: tp @A(a: X) <: tp A(a: X) while tp A(a: X) = tp A(X) = (X) as there is no nominative lable.

Value Types

Value types would supported:
- Constant - this is the default and will use = for assignment in absence of mutation effects.
- Variable - mutation effects can be supported and will use := and =: for mutative assignment head and tail.
- Reactive - will use a dequeue which users to transfer variables using from-to (~>) and to-from (<~) using relevant effects.
Values are defined as tm a: int = 1 or a: int = 1 by omitting optional tm. If mutation effects are present a: int ::: MUT = 1. The definitions follow the pattern tp symbol-or-literal : types : scope :: refinement-predicates :.: values-ranges-and-state-trasitions :~: proxy :=: proof-checks :@: annotations.

Type Systems Operators

Following type operators would be available:

Holes, Anonymous Values and types

[_] (and [_][0], [_][1], [_][2], …, in order if there are many) refers to anonymous or unnamed values and [?] (and [?][0], [?][1], [?][2], …, in order if there are many) refers to and anonymous or unnamed types. Similarly [#] refers to anonymous or unnamed symbols with [#][0], [#][1], [#][2] if there many symbols. [->], [<-], [|->], [<-|], [=>], [<=], [|=>], [<=|] and [<->] (for invertible functions) can be used for anonymous and unnamed functions with [<-][0]/[->][0]/[<->][0]/…, [<-][1]/…, [<-][2]/… referring if there are many. Similarly [<>] refers to environment or scope which is anonymous or unnamed scopes can be referred as [<>][0], [<>][1], etc. [*] can used to refer to anything which is anonymous or unnamed.

[_][a <: X :: s"${a}".match("a*")] refers to find values which is a subtype of X and starts with “a”. [?][A <: X :: s"${A}".match("A*")] refers to find types which is a subtype of X and starts with “A”. [#][a :: s“${a}”.match("a*")] refers to find symbols starting with “a”. [->][f: s“${a}”.match("a*")] refers to a functions starting with “a”.

Symbols

Symbols are defined using sm or #.

Following a :: s"${#}".match("get*") := getX() ensures a can only be assigned from a variable or function starting with “get”.

Function

Functions are defined with fn taking 1 tuple or function and retuning a tuple or another function. A function can also have a context. Perhaps a function could be defined as in the following examples:fn f: (ctx: X |=> a: A -> (io = (x)? (x > 0): io |? (), mem = mem) -> ()) {...} where IO an effect is present given x > 0, fn f: a: A -> b: B {...}; fn f^: a: A <- b: B {...} or fn f^: b: B -> a: A {...} where inverse functions is also defined. If the inverse is algebraically solvable and invertible then the following can be used where the compiler derives the inverse fn f: a: A <-> b: B {...}.

|-> or <-| can be methods and extension methods. Method can be part of many objects (a: A :+: b: B) |-> ... or operate on many objects, e.g., (a: A, b: B) |=> ... with usage as (a, b).f or a: A |-> b: B |-> ... with usage as b.(a.f) or (a, b).f. If |=> or <=| is used the parameters are passed implicitly.

The parameters can be values, variables, types, functions, effects. Type parameters are defined as A => or <= A. In case A is of a specific type A: X => ... or ... <= A: X where X is a type.

Functions can be defined on values, e.g., fn f: (0..9 -> 10..19) :|: (20..29 -> 30..39). If invertible fn f: (0..9 <-> 10..19) :|: (20..29 -> 30..39).

As functions can be deduced by the presence of the arrows fn omitted if desired.

When a function is passed into a function then the execution environment will become also available unless limited by access control.

Types

Types are can be define as tm @A(a: X) or ‘@A(a: X). Types can be part of the tuple, e.g., tp A(tp X) or tp A('X) or ‘A('X).

Tuples

Tuples can have nominative labelling, e.g.,tp A(b: B, c: C). So can the fields, e.g.,tp @A(@b: B, @c: C). Nominative labelling can be mixed, e.g., tp @A(b: B, @c: C) or tp A(@b: B, c: C).

Also membership can be depend on a value tp A(b: B, (x)? (x > 0): c: C |? (_): ()).

Unions

Union can be both tagged and untagged. Untagged union and tagged union uses :|: and :+:, e.g., tp A(X :|: Y). tp A(x: X :+: y: Y), tp A(@x: X :+: y: Y), tp @A(x: X :+: y: Y).

The participation can be conditional on a value, e.g., tp A(x: X :+: (z)? (z > 0): y: Y |? (_): ()), or type.

Intersection

Intersection types can be both tagged and untagged and uses :&: and :*:. Participation can conditional on a value or type.

Conditional Types

Types can be defined using pattern matching. E.g. tp a: (z)? (z > 0): X |? (_): Y

Range Types

Data ranges are represented as types.

Annotations / Tags

Annotations are regular type of values which are associated with other types of values. Annotation are accessible to the meta programming facility which does the annotation processing.

Companion Objects

Companion objects can be implemented defining a singleton object, e.g., vr A(a: int). This similar to type definition but using vr instead of tp.

Effects

Effect definition will have the same definition syntax with as types and variables to reduce syntactical complexity. Effect can be conditional and dependently typed. They can be also 1st class and parameterised so that different effect handlers can be swapped, e.g., changing allocators. If viable effects may use the same syntax used to define singleton object.

Fine Grain Access Modifiers

Access modifier are not baked into the language but can be language defined. Using attributes fine grain access modifiers can be implemented. E.g. a :@: (x)? (true): read(allow = q"<: A :&: B") |? (false): read(allow = q"<: X :&: Y") any subtype of A :&: B can read a but not otherwise when x is true and any subtype of Z :&: Y can read a when x is false. The checking mechanism will prove that any usage is completely sound.

Modules and Packages

Packages and modules can use the same syntax to define singleton objects.

Expressions

Expressions evaluate to values or types and can be conditional on other types or values. Similarly function.

AOP Through Effects

AOP programming will be facilitated through the effect system.

Control Flow

Control flow is achieved through effects, but mimic for for fixed number of iteration, while for precondition based iteration, do while for postcondition based iteration, in the case of loops and if conditional statements. Control flow can be defined as effects and syntax can be enhanced through macros.

Other Control Mechanisms

Fist call continuations, generators, coroutines, exception will be supported through the effect system.

Memory Allocation

Memory allocation is also done through the effect system. Different allocators and allocation schemes can be defined the users also.

Based on the allocator used first class heaps and region based memory allocation schemes would be available.

The allocators as well as the whole system should ideally be implemented in the language itself without the need of any low level or foreign language code. Also the code should be deemed as safe code.

Threading and Parallelism

Parallel primitives and patterns as well as algorithmic skeleton with nesting and composition are defined through the effect system at the library level. The users can add additional items.

Similarly constructs to implement process calculi and algebras, cancel and error propagations, supervision structures can be provided and the users can extend them and also provide their own.

The transfer of the variable bindings to different scopes and access control also can should be modellable.

IO and FFI

IO and FFI calls are achieved through effect system. Any IO or FFI cannot be unsafe hence all exceptional situations much be handled and should not break the guarantees of the language and type system.

Typed State, Enforced States and State Transitions

Initial state and state transitions can be enforced, e.g., a :.: (1, 2, 3) -> (4, 5, 6) <-> (7, 8. 9), (4, 5, 6) -> (10, 11, 12). The transitions can be , or | or & separated, e.g., p: Point :.: (0, 0) -> (1, 0) | (0, 1) & p.x != p.y.

Also subtyping can be conditional or state dependent.

Syntactic Macros, Syntax Extension and Operators

Syntax to a certain level will be extensible. This also can be used to have custom operators with flexible associativity and precedence.

Sub and Super Typing

The type system can be very flexible where sub and sub types can be defined, e.g., tp A <: B define that A is a super type of B, similarly tp A :> B define that A is a super type of B. If a type cannot have a super type this can be enforced by quoted expressions (tp A :: q"$super(A)" == ()) or type queries (tp A :: [?][? :> A] == ()).

To have consistent memory layout the right most elements are placed before the elements in the left by default. Alternatively tp A :> B (B..., a: int) can be used to specify the layout.

Arrays

Arrays can be defined as [(int, int) -> int][3, 5]. Infinite sequences

First-class Code and Expressions

Code and expressions are 1st class. Also language is homoiconic. Code can be generated, manipulated and transformed like any other data structure.

structure-charger commented 3 years ago

Philosophy:

Try to find the exactly one method/library/structure to do the thing in one domain.

Some advices:

Add dag graph data structure for any language translation , any library analysis/refactory, try language synthesis in the same domain all in one library and in one syntax; strong/efficient tag system. use the previous library as many as possible. strong/efficient tag system.
May be we can find a recursion method for graph like data structure in language translation. Try to use some NLP methods for language translation(such as word2vec or others).

Reference：

https://docs.shiftleft.io/ocular/cpgql/reference-card (see also SPARQL(knowledge base) and Graphdbs usage for language analysis and translation)

http://harmonia.cs.berkeley.edu/papers/twagner-glr.pdf

codeviz(inspire from graphviz)

codeql

https://tree-sitter.github.io/tree-sitter/

language reference: haskell(project semantic) ATS2 Oberon2 or Oberon7(Pascal like) HSV (Bluespec; for VLSI design, haskell like)

jackfirth commented 3 years ago

There's a lot to unpack here. But first of all, thank you for your enthusiastic interest in the future of Racket!

I think it would be easier to engage with these ideas if you separated out what necessarily has to be done in Rhombus from what can be done in libraries and frameworks. That would make it easier to prioritize. Additionally, have you looked through much of the existing discussions and issues? Some of the ideas in this issue may be related and it would be helpful to see where they fit in with the other conversations going on.

mflatt commented 3 months ago

527