wenyan-lang / wenyan

文言文編程語言 A programming language for the ancient Chinese.
https://wy-lang.org/
MIT License
19.71k stars 1.1k forks source link

Syntax Considerations for a dependently typed Wenyan-style proof assistant #619

Open UltimatePea opened 4 years ago

UltimatePea commented 4 years ago

Syntax Considerations for a dependently typed Wenyan-style proof assistant

Motivation

I find wenyan to be the first language that elegantly expresses computational constructs in natural way in Chinese. I truly appreciate the authors of this language for opening a stage for real Chinese programming. However, I find that writing programs in wenyan is significantly unsatisfying due to its natural association with procedural programming and limited library exposure. I would like to propose a new language that hit the translation problem from a different perspective, a perspective of purely functional dependently typed programming language.

Following the state of art dependently typed programming language, I find that wenyan language is especially suitable for the concrete syntax for such a language. In this post, I would like to present a possible plan for implementing dependently typed purely functional programming language following the style of Agda programming language. I will call this language wyd, standing for wenyan-dependent.

https://agda.readthedocs.io/en/v2.6.0.1/language/index.html

Forms of judgement

There are two forms of expressions in wyd, the type judgement "expressione is of type T", written in Agda as (e : T), and “expression e1 is defined to be expression e2”, written e1 = e2, (moreover, in the second case, e could be a matched pattern. I suggest the following concrete syntax for the two judgement forms.

e : T
有物名e,实T之物也。
e1 = e2
e1者,e2也。

例如,

a : Nat
有物名甲,实数之物也。
a = 2
甲者,二也。

General Syntactic considerations

In Agda, there are only 7 characters that cannot appear in a name, they are .;{}()@“. Dot is used to delimit packages. {} is used for implicit named arguments, () is used for expression association. is used for quotes.

Following this tradition, I propose to use symbol 「」 both for named reference and for expression level quoting. That is, we interpret the innermost 「」 as quoting a name and 「」 which encloses further 「」 as parenthesis. As in wy, we use「「」」 for string construction. In the current language, (I shall abbreviate this version as wy below), 「」 is used to refer to names (so as to avoid clash with reserved keyword). I propose not to make this distinction. As reserved keywords can be handled in an unambiguous way in wyd.

Following this tradition, e1 = e2 can also be written as 「e1」者,「e2」也.

a : Nat
有物名「甲」,实「数」之物也。
a = 2
「甲」者,「二」也。

I would like to adopt Agda-style mix fix operators in the language so that we can avoid using 「」 as much as possible. To declare operator precedence, I suggest the following syntax for mixfix operators

infixl if_then_else_ 30
有左算子者,名「若〇者〇若非〇也」,排行三十也
infixr _->_  -1
有右算子者,名「若〇则〇」,排行负一也

We may stipulate that the default operator precedence is 20 when not explicitly stated.

In fact, we could define previous two judgement forms using this new operator.

infixl _:_ -2
有左算子者,名「有物名〇实〇之物也」,排行负二也
Infixl _=_ -2
有左算子者,名「〇者〇也」,排行负二也

However, different from useful types we consider below, the semantics of these types are handled by compilers directly.

The above is all we need when defining a programming language. The rest is just standard library.

Function types and their definitions

Although function types could be viewed as an inductive type as defined below, but in order to present the general syntax of inductive types, we need function types first. Function types are handled internally in Agda, I believe we could also handle function types directly in wyd. Since we haven’t provided an syntax for defining inductive types, in this section, I will assume the existence of two pre-defined types, Nat and Bool, . We will assume the existence

The function type are introduced by the operator ->,

infixr _->_ -1
有右算子者,名曰「若〇则〇」,排行负一也

The function application are in the style of a space. Since we cannot use in wyd, I suggest we use as the application operator with the precedence 0.

有左算子者,名曰「〇于〇」,排行零也

Then, Nat -> Nat -> Nat could be written as 若数则若数则数 or 若数则「若数则数」, and f x y z as 己于甲于乙于丙.

For example, to define a function that checks whether a particular number is zero, we could use the following syntax:

zero? : Nat -> Bool
zero? x = x == 0
有物名零否?,实若数则爻也。
零否?于甲者,甲等于零也。

Dependent Function types

Dependent function types are types that depend on value. They are also called Pi-types. In Agda, dependent function types are written using the following syntax (x:A) -> B. Since we’ve specified how operators _:_ and _->_ are transcribed in wyd, we may know how to transcribe the entire expression. Notice that since _:_ bind more loosely (binding -2) than _->_ (binding -1), we almost always need parenthesis (「」) around the colon expression.

Implicit arguments are arguments that can be omitted upon function application, written {x : A} -> B in Agda. We may prepend before to indicate implicit arguments. To indicate application of implicit argument, I suggest we use in place of .

Example:

zero’’ : (x : Bool) -> Nat
有物名零否第二,实若「有物名甲,实爻之物也」则数之物也。
zero’’’ : {x : Bool} -> Nat -> Nat
有物名零否第三,实若「有物名甲,实爻之物也」则若数则数之物也。
zero’’’ {x} y = y
零否第三盖甲于乙者,乙也。

We could also write elements of function type by the lambda notation λ x -> e, I suggest using 有x则e. Therefore, 零否第三 in the previous case could also be written as 零否第三盖甲者,有乙则乙也.

Discussion

Function types are the most crucial type to Agda-style proof assistant, but the naming conventions clash with the ancient Chinese wisdom. For there is no construction of 'if...else...' in classical chinese, but the concept of generation(), however, the syntax would look awkward when higher order functional arguments are present. But we should support the definition of (〇生〇 as 若〇则〇);

Syntax of inductive types

All types in wyd are inductive. I will introduce their general definition. An inductive definition gives a type possibly parametrized and indexed over other types, by enumerating ways to construct them. For instance, the type Bool is given by two constructors true and false, and the type Nat is given by Z and S. I suggest using the word for the keyword data, for a type usually expresses some proposition, which corresponds to the notion of in ancient Chinese.

The type resides in the universe. The universes are named Set_i in Agda for Natural number i. I suggest we use 天下 to indicate universe.

The general syntax of data definition is as follows. https://agda.readthedocs.io/en/v2.6.0.1/language/data-types.html#parametrized-datatypes

data Name {...args} (...args) : T -> Set where
    c1 : T1 -> Name ...
    c2 : T2 -> Name ...
    c3 : T3 -> Name ...
有道名Name盖...args于...args,实若T则天下之道也。
    有物名c1,实若T1则Name于...
    有物名c2,实若T2则Name于...
    有物名c3,实若T3则Name于...
    云云

For example, we may define Bool and Nat conveniently as follows.

data Bool : Set where
    true : Bool
    false : Bool
有道名爻,实天下之道也。
    有物名阳,实爻也。
    有物名阴,实爻也。
    云云

data Nat : Set where
    Z : Nat
    S : Nat -> Nat
有道名数,实天下之道也。
    有物名零,实数也。
    有物名进一,实若数则数也。
    云云

Discussion

The syntax here is optional. Depending on whether we want to use white space to delimit definitions. Also, the use of new line characters is assumed in the previous examples, but we could relax this restriction by observing that all sentences so far is in the form of 有物...也, 有道...云云, ...者...也.

For a more complex example, consider the type of List with length Vec.

data Vec {a} (A : Set a) : ℕ → Set a where
  []  : Vec A zero
  _∷_ : ∀ {n} (x : A) (xs : Vec A n) → Vec A (suc n)

We may transcribe it in wyd as follows.

有道名向量盖小甲于「有物名大甲,实天下于小甲之物也」,实若数则天下于小甲之道也。
    有物名空列,实向量于大甲于零也。
    有物名「于〇之首追加〇」,实
        若「或有物名原长,实数之物也」
        则若「有物名甲,实大甲之物也」
        则若「有物名众甲,实向量于大甲于原长也」
        则向量于大甲于「进一「原长」」也。
    云云。

Discussion

Notice that we've used both in application and names, this is ok because names are quoted.

So far is our core language defined. It is quite powerful that we can already encode a lot of computations, e.g. arithmetic operations on natural numbers, list operations on vectors, etc. For the rest of this issue, I will list all common types seen in Agda and propose their transcription in wyd.

Product types and dependent pair types

Product types in Agda are written A × B, and dependent products are generalizations of product types, and are written as Σ A (λx -> B). I suggest in the first case, we could transcribe it as 甲且乙, and in the second case, we could transcribe it as 甲又「有戊则乙」.

Sum types

Sun types are usually written A + B, but to differentiate with the addition operation, Agda uses the following notation A ⨄ B. Since there will be no ambiguity in wyg, we could use 甲或乙.

Identity Types

These are also called proposition equality types. I suggest using 〇等于〇 for this type. Notice the occurrence of in this case is ok because 〇等于〇 has a higher precedence. The element of _≡_ is called refl in Agda. We may call it 自反性 in wyd.

Example:

p : {x : Set } -> x ≡ x
p = refl
有物反诸自身者,实若「或有物名甲,实天下也」则甲等于甲也。
反诸自身者,自反性也。

Conclusion and final remarks

I've presented a proposal for concrete syntax of wyd, wenyan-dependent the syntax of possibly the first chinese proof assistant. I believe the present syntax is a viable way of expressing programming logic. I am always against the description of wenyan as an esoteric language, because I believe that a good programming language should be similar to natural languages. Please leave any comments below if you have suggestions to the proposal. Thank you for reading!

LingDong- commented 4 years ago

Hi @UltimatePea,

Wow thanks for the detailed proposal!

I don’t have any experience with proof assistants myself (though I've always been thinking about learning them), but I can see some really nice properties with the syntax you’re presenting. I’ll take a more careful look and try to digest it when I have time :P

Wondering what your plans are. If you would like to lead the development of this, I’m more than happy to add you to the wanyan-lang organization on Github where you can start a new repo named wyd.

Meanwhile we can keep this issue open for collecting comments.

Thanks

ghost commented 4 years ago

TL;DR 对于运算符的定义我认为可以以下面这种方式实现。(以下代码为Java)

// Example
// First replace all the whitespaces with empty string
input.replaceAll("\\s", "");
/* Then replace the operator. Here I use the example:
若_group1_则_group2_
->
if (_group1_){_group2_}
*/
String op = "若(.*?)则(.*?)";
Pattern pattern = Pattern.compile(op);
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
    return "if (" + matcher.group(1) + "){" + matcher.group(2) + "};"
}

到这里,转换就结束了。 对于运算符执行顺序的问题,可以调整转换顺序来实现。那么最后就会得到类似于这样的一段代码:

for (op in operators) {
    input.replaceCustomOperator(op);
}
return input;

没有试验过,说说想法而已。

UltimatePea commented 4 years ago

@LingDong- Thank you very much for your swift response. I will try to see if it is feasible to implement it. I will follow up here when I have more details.

UltimatePea commented 4 years ago

Sorry, this project is taking me longer than expected. I will update here when I have some progress on this issue. Please feel free to comment or make suggestions in the meantime.

UltimatePea commented 4 years ago

cf #613

UltimatePea commented 3 years ago

It took me a while to figure out. It seems that research community hasn't agreed upon on how to bring dependent types into actual programming languages. However, since there are abundant literature on non-dependent functional programming languages, I decided to implement a demo, which incorporates some features from the above discussion.

In particular, it supports mixfix parsing (that means very few keywords!), strong static typing. I also wrote a little interpreter for it.

Check this out: https://github.com/yuyan-lang/yuyan

nobodxbodon commented 3 years ago

@UltimatePea just FYI here's another active proof assistant project Aya.

UltimatePea commented 3 years ago

Thanks for the information.

Contrary to the title, I intend 豫言 to be a prototype for a commercially practical programming language (in Chinese) that doesn't have dependent types.

Proof assistants on the other hand always have dependent types. I am not doing proof assistants.

UltimatePea commented 3 years ago

I think it is better to put it in a separate thread so as to avoid confusions.