larmel / lacc

A simple, self-hosting C compiler
MIT License
887 stars 64 forks source link

use for scripting language? #2

Closed assyrianic closed 2 years ago

assyrianic commented 7 years ago

Hello larmel, I'm Assyrianic and I'm looking to create a scripting language for my game engine. My intention is to use a real C compiler (LACC in this case) and modify it to produce VM bytecode rather than x86 assembler. Thus, my developers do not need to create or learn a new language just to create content. Please do not take this in the wrong way but how stable is LACC?

larmel commented 7 years ago

Hi. This sounds like a cool project, and exactly the kind of thing where I envisioned lacc could provide some value.

I would say the compiler is stable in the sense that for well formed input it will rarely produce the wrong output (or crash), due to very good test coverage from randomized testing. Invalid input however, is not automatically tested, and there are probably many issues with validation.

But I guess you might be more interested in stability of the implementation itself; that internal data structures and functions stay more or less the same. I have no plans to fundamentally change anything about the implementation, but here I can make no promises.

assyrianic commented 7 years ago

Actually this sounds really good, even if the LACC miscompiles, the bad VM code will be exception'd off so there's no real chance of any catastrophic code failure. Also, what would be the chance of having create a linker for LACC unless a linker was already made? I haven't check on the basis that my manager hasn't yet taken a look at approving LACCs use. The purpose for a linker is so we could provide our bytecode as a plugin API for inter-script communication.

LACCs code implementation looks and compiles pretty stable on my end though (64bit Ubuntu 16.04) I was only asking about the stability of the code it produces. If it produces unstable code then it's no worries but my team just wants to know the frequency of unstable output code.

If the frequency is low, we could probably tool & patch the critical areas.

larmel commented 7 years ago

Regarding linker; there is currently no support for this in lacc, the compiler only considers single translation units and produces object files. I have thought about adding some basic linker functionality to be able to produce executable files directly, but that will not happen any time soon (if ever).

lieff commented 7 years ago

There no need in real linker for scripting. We only need API that passes symbol addresses to LACC and application do the rest. For tcc I even forced to erase internal linker and use only tcc_add_symbol. Also I found some bugs in tcc that I can't fix myself for now, so I'm looking for replacement.

assyrianic commented 7 years ago

@lieff The purpose of having the linker is to provide script-wise communication, the linker can build a "lib" form of a compiled script. If each script is sandboxed, how would they communicate with other scripts to provide modular scripting? If I sound like I have no clue what I'm saying, it's because this is my first time having to build an entire scripting engine. @larmel My manager has approved use of LACC :1st_place_medal: ! No worries about the linker, I could probably pass that job onto someone else.

lieff commented 7 years ago

@assyrianic yes, sandbox is second pain if you need security. I usually use ptrace sandbox or seccomp-bpf. It's not directly linker\symbol api issue.

assyrianic commented 7 years ago

alright gentlemen, I've built a prototype of the scripting engine. I named it Crown because C is king hehe, anywho, I'm still working on it but it's my first time so be easy on me lol.

Right now, I'm more worried about having scripts/plugins be able to interact with each other. Since C programs are usually compiled into one executable, I'm worried if the plugin-scripts can pass data between one another in a similar fashion. I also do not want to compromise security either.

Also, I just noticed it's been a month+ since the last message haha.

lieff commented 7 years ago

@assyrianic: I made small modifications to make it work:

diff --git a/vm.c b/vm.c
index 3d5fce0..523efba 100644
--- a/vm.c
+++ b/vm.c
@@ -74,7 +74,7 @@ void crown_load_script(CrownVM_t *restrict vm, uchar *restrict program)

            if( code->uiInstrCount ) {
                // TODO: have pInstrStream as calloc'd array and memcpy the program.
-               code->pInstrStream = calloc(code->uiInstrCount, sizeof(uchar)); //program;
+               code->pInstrStream = calloc(HEADER_BYTES+code->uiInstrCount, sizeof(uchar)); //program;
                printf("crown_load_script :: allocated instruction count: %u\n", code->uiInstrCount);
                if( !code->pInstrStream ) {
                    printf("crown_load_script :: failed to load script :: instruction array is NULL!\n");
diff --git a/vm.h b/vm.h
index a15da34..8d1ed38 100644
--- a/vm.h
+++ b/vm.h
@@ -107,7 +107,7 @@ union conv_union {
 enum typeflags {
    flag_void = 0x0,
    flag_uchar=0x01,
-   flag_char=
+   flag_char
 };

 typedef void *(*ExportFunc)();

Look also http://www.clifford.at/embedvm/ Now is hard part lacc->vm or lacc->memory jit to make it suitable for scripting.

assyrianic commented 7 years ago

@lieff pushed fixes, thanks for checking up on that. What do you think of the VM so far? the 'v2' branch has it as close to C's capabilities as possible. the current branch has almost no opcodes because I'm still testing script capabilities.

wow that embedvm is sorta terrible for general applications scripting but I see why you're recommending it, thanks so much man! I'm not going to really use a JIT; if someone wants a JIT, I'll definitely help them with whatever they need of it but I have no interest in making one. CrownVM needs to be as simple as possible.

larmel commented 7 years ago

@assyrianic, interesting to see your vm. It should be feasible to modify lacc to have CrownVM as a backend target, just like dot and x84_64 ELF/asm is now. I would start by adding a CrownVM folder in src/backend, where you can implement translation from lacc IR to CrownVM bytecode. Modify compile.c to call into your new implementation. Right now there is a lot of x84_64 codegen in compile.c, but this file should really be just a dispatch to different backend targets.

assyrianic commented 7 years ago

excellent! I should say that I'm modifying the VM into a general scripting engine, you gents wouldn't happen to know or have any resources on scripting engines as I still require scripts to be able to share data between one another.

assyrianic commented 7 years ago

@larmel I've upgraded the VM by a whole lot since 1 month ago. I tried to modify the backend for LACC but I keep failing. When you're not busy, is it possible for you to help me with LACCs backend? The VM is stack-based also and the instruction set is mostly complete for now unless I need to add another feature to support full C.

larmel commented 7 years ago

@assyrianic While your VM is stack based, lacc IR is modeled with assignments to temporary variables, which I guess makes it difficult to translate to push/pop etc. It would perhaps be easier to generate a stack-based IR straight from lacc parsing while traversing expressions, instead of trying to convert the CFG. In any case, I will not be able to contribute outside of improving lacc itself.

assyrianic commented 6 years ago

@larmel No problem, I would like to ask if you can provide an API reference for the backend. Also, even though my VM is stack-based, it can handle temporary variables easily. Just like the backend for x86-64, Tagha is 64-bit and the stack is 8-byte aligned with access to different data size addressing so a 12-byte struct can easily be fit into 16 bytes which is essentially 2 stack spaces obv.

I can probably make the backend for Tagha work very well but I just require an API reference.

larmel commented 6 years ago

@assyrianic, the API reference would be in code comments. The backend interface is in compile.h, and is basically a single function taking a struct definition and producing some output based on that. To implement a new backend, you will need to work with definitions, blocks, statements, expressions and vars, as defined and documented in ir.h.

assyrianic commented 6 years ago

@larmel alright, I have some great news. I've upgraded tagha's VM to a register-based machine! Not only that but it works a little similar to how x86/x64 is as well. Upgraded it with 11 general purpose registers with 4 addressing modes!