woodrush / lambda-8cc

x86 C compiler written in untyped lambda calculus
MIT License
617 stars 16 forks source link

Some questions #1

Closed atennapel closed 1 year ago

atennapel commented 1 year ago

Amazing work! I had some questions and I didn't know where to ask them, so I'll just leave them here:

woodrush commented 1 year ago

Thanks! I'm very happy that you found it interesting!

Also thanks for your questions. Here are what I understand about its answers:

the definition for nil (\nil. b) (\x y. y)

Although the plaintext is in the limelight, the code is actually aimed to be optimized for the binary lambda calculus (BLC) notation. In BLC, a variable with a De Bruijn index n has an encoding of length n, and nil has a constant length 000001 = 6. Therefore, by encoding nil with a variable with a De Bruijn index less than 6, you save code length in BLC notation.

However, This shouldn't affect the length of the PDF that much. Most of the PDF is packed with either the memory initialization listing or the instruction listing, and for both encodings, I think that the BLC-efficient encoding is also efficient in the plaintext encoding since it revolves around reusing terms by abstracting them. It's only partially optimized so I wonder what the true optimum version is though.

Optimization also extends to which register addresses you assign for each register in LambdaVM. This is equivalent to choosing a Huffman encoding that corresponds to either the number of register occurrences in the code or during runtime.

What is the level of "depth" of the normal form?

The deepest abstraction is created by the longest list. The longest list is either the program with the longest ELVM program counter (8cc or elc), or, the program counter with the longest number of instructions. In ELVM IR, each program counter has a variable number of instructions. Therefore, counting the number of PCs or instructions should give you the answer with a constant difference (the depth where the program is nested).

what is the largest DeBruijn index of the program? (probably same answer as previous question)

Interestingly, this is totally different from the previous value. This is because the largest abstraction is only created by the longest list. In lists, the abstraction f is mostly a proxy abstraction used as a value container, and it's referenced only once in its immediate environment and is never referenced again. Even the deepest location of the instructions, every lambda can be written using alphabets up to x,y,z,a,b,c,d,e.

On the other hand, LambdaVM has a lot of deep abstractions, where lowercase alphabets are not enough to handle the depth. My variable name assignment algorithm in LambdaCraft partially handles variable name shadowing so I don't know the exact value, but since LambdaVM only uses lower case + upper case alphabet, the largest De Bruijn index in the entire 18,506-page PDF should be less than 52=26*2.

atennapel commented 1 year ago

Thanks so much for the detailed answer! It's very interesting. The max deBruijn is surprisingly low.