vnmakarov / mir

A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR
MIT License
2.24k stars 145 forks source link

Shebang support for Mir files? #378

Open rempas opened 7 months ago

rempas commented 7 months ago

Would you consider to support shebangs? I think that this will make Mir files feel much more "native" and easier to use.

From what I understand, there should be support to both add a shebang when you generate the final "bmir" file and then, there should be support to "skip" the shebang in the beginning of the file when it is given for interpretation (jit or not).

Let me know what you think!

vnmakarov commented 7 months ago

Sorry for delay with the answer. Usually I am very busy at the end of year.

I never thought about this. I guess it is not hard to implement. So I am not against this. I has to much work on my plate to implement it myself. But I am ready to consider the corresponding PR.

Imho shebangs is more oriented to textual files. As MIR-textual file has # as a comment start, shebangs should already work for MIR-textual files. But binary files are much smaller.

rempas commented 7 months ago

Hello! Glad you are doing fine my friend, I haven't seen you in a while!

Well, shebangs can be used with Mir so we can directly execute them through shell or through the "execve" system call. It will make Mir files feel more "native" in my opinion, and it will make it seem more professional overall.

I suppose it would be easy to implement. Unless I see something, there should be 2 steps.

1) The final Mir executable file should contain the shebang. 2) When the Mir loader (is this how is it called?) is called on a file, it should see if the first character is '#' and then, skip until the end of the line (that would be the separator). Then, it normally executes.

I would be glad to help if you would tell me which places I should look at and change, as I never looked into Mir's code before, and I'm not exactly the best at looking at source code of other people. And with that I mean that I suck hard and never done it before. So in the positive side, it would be a nice first experience! ;)

I should also mention that I want to use Mir alongside LLVM for Nemesis, so the time has come where I don't tell you how I plan to use Mir, but I actually go and use it! I will start building the frontend soon so we're going to see some real world usage!

Wiguwbe commented 6 months ago

Hello there,

I think another option could be the binfmt_misc , but we still need a mir executable/loader, which shouldn't be too hard to do

rempas commented 6 months ago

Hello there,

I think another option could be the binfmt_misc , but we still need a mir executable/loader, which shouldn't be too hard to do

Hello and happy new year!!!

This looks very interesting! I have seen a video about RISC-V assembly on Linux and at some point, he is able to run the problem without invoking "qemu-riscv64" before the name of the program. I suppose, this is what it's used there!

Compared to shebangs, this has the advantage of been more clean and not requiring extra work to happen. The disadvantage however is that it's not portable and every system will have to add it. It is a small disadvantage however if the loader is packed for the package managers and automatically add it every time it is installed (same way that happens on Arch with the "qemu-user-binfmt" package for QEMU)!

As for the loader, c2m already has the loader built-in, so I suppose that there will need to happen a small change to "separate" the loader into its own project. I may be wrong however and require more work, I don't know the codebase and structure of Mir. It can even be tested NOW using c2m -eg as the loader!

Wiguwbe commented 6 months ago

I added some modifications to the c2mir driver, with binfmt_misc line :mir:M::MIR::/usr/local/bin/c2m-el:P I am able to run:

$ cat test.c
#include <stdio.h>

int main(int argc, char **argv)
{
  for(int i=0; i<argc; i++)
    puts(argv[i]);

  return 0;
}
$ c2m test.c -o test.bmir
$ chmod +x test.bmir 
$ ./test.bmir hello world
Code generation of function main:
  Code generation for main: 62 MIR insns (addr=7fc811ce20e0, len=256) -- time 0.19 ms
./test.bmir
hello
world
$ cat /proc/sys/fs/binfmt_misc/mir
enabled
interpreter /usr/local/bin/c2m-el
flags: P
offset 0
magic 4d4952
rempas commented 5 months ago

I added some modifications to the c2mir driver, with binfmt_misc line :mir:M::MIR::/usr/local/bin/c2m-el:P I am able to run:

... enabled interpreter /usr/local/bin/c2m-el flags: P offset 0 magic 4d4952

Damn, you guys make me feel jealous! I'm too lazy to do anything other than watching YouTube the last 1 month! 😭

This is basically the idea! Now, all that remains is for Vladimir to "separate" the loader and make it its own executable. This would also allow "c2m" to be compiled with Mir (which now that I think about it, you can compile it, but you won't be able to use it afterwards as the loader needs to be an ELF file and now the loader is part of "c2m").

One thing to think about that now is the fact that the "binfmt_misc" file will hardcode the opinion passed to the loader. Now, we can say that in a real scenario (release binary), we can forget the interpretation mode. However, there are still two JIT modes one been lazy compilation (-el, the one we used) and the second been full compilation (-eg) which compiles everything before it is called (or something like that? I don't know exactly how it works, tbh). I like the idea of the lazy mode, but I wonder if noticeable slowbacks will be noticed when it is used compared to the other mode.

Wiguwbe commented 5 months ago

Don't feel bad xD we all go through "phases" of some sort,

Anyway, this was mostly a PoC, I used the c2mir sub project because it already had the symbol resolver and the functions to make it compile (top API doesn't seem to have that?)

Regarding the hardcoded option, from the expected command line usage (some-command arg1 ...) we can't really pass the -ei/-eg/-el flag to MIR without "colliding" with the actual command arguments.

Although I'd suggest a environment variable to configure which execution to run (interp/JIT/lazy), that the MIR loader would see:

MIR_EXE_WAY=interp some-command arg1 ...
# or
export MIR_EXE_WAY=lazy
# ... Later on ...
some-command arg1...

(Here MIR_EXE_WAY is an example name, I'm taking suggestions to replacements XD and some-command is a MIR binary)

In general you'd have to modify the binary, even with shebang, to specify the execution "way" to go for (I believe).

A new loader would be advantageous here indeed, it would also remove the need for the .bmir file extension,

I may be able to look into it next week

rempas commented 5 months ago

Don't feel bad xD we all go through "phases" of some sort,

Thank you! I'm slowly getting over it ;)

Regarding the hardcoded option, from the expected command line usage (some-command arg1 ...) we can't really pass the -ei/-eg/-el flag to MIR without "colliding" with the actual command arguments.

Although I'd suggest a environment variable to configure which execution to run (interp/JIT/lazy), that the MIR loader would see:

MIR_EXE_WAY=interp some-command arg1 ...
# or
export MIR_EXE_WAY=lazy
# ... Later on ...
some-command arg1...

That's a great idea! We could have a default mode (I suggest "lazy") when the environment variable is not set, and then have the environment variable be set or modified (if already set) when you want to use a different mode.

The only thing I would change for that is the "JIT" mode to be used with lowercase letter ("jit"). "interp" could also be changed to the shorter "exec" which also makes sense as this is practically what the interpreter does.

(Here MIR_EXE_WAY is an example name, I'm taking suggestions to replacements XD and some-command is a MIR binary)

Another name I thought about is: MIR_TYPE to make it shorter and keeps making sense.

In general you'd have to modify the binary, even with shebang, to specify the execution "way" to go for (I believe).

And for that reason, binfmt_misc is better, as it's easier to modify it, and we don't have to write more code!

A new loader would be advantageous here indeed, it would also remove the need for the .bmir file extension,

Yeah, removing the extension would also make it feel better and more like a "system" file!

I may be able to look into it next week

Sounds awesome! Glad you are having fun with the project! :)

Wiguwbe commented 5 months ago

Hello,

I created the mir-run binary that is dedicated to running MIR binaries from the binfmt_misc interface.

Here is an example of running it:

[user@optiplex /tmp]$ cat t.c
#include <stdio.h>

int main(int argc, char**argv)
{
        for(int i=0;i<argc;i++)
                puts(argv[i]);
        return 0;
}
[user@optiplex /tmp]$ c2m t.c -o test
[user@optiplex /tmp]$ chmod +x test
[user@optiplex /tmp]$ sudo su
[root@optiplex tmp]# line=:mir:M::MIR::/usr/local/bin/mir-run:P
[root@optiplex tmp]# echo $line > /proc/sys/fs/binfmt_misc/register 
[root@optiplex tmp]# cat /proc/sys/fs/binfmt_misc/mir 
enabled
interpreter /usr/local/bin/mir-run
flags: P
offset 0
magic 4d4952
[root@optiplex tmp]# 
exit
[user@optiplex /tmp]$ ./test
./test
[user@optiplex /tmp]$ ./test hello world
./test
hello
world
[user@optiplex /tmp]$ MIR_TYPE=jit ./test hello world
./test
hello
world
[user@optiplex /tmp]$ MIR_TYPE=lazy ./test hello world
./test
hello
world
[user@optiplex /tmp]$ MIR_TYPE=not-valid ./test hello world
warning: unknown MIR_TYPE 'not-valid', using default one
./test
hello
world
Wiguwbe commented 5 months ago

Here is an example with shared libraries:

[user@optiplex /tmp/mir-with-libs]$ bat lib.h lib.c test.c
───────┬────────────────────────────────────────────────────────────────────────────────────────────
       │ File: lib.h
───────┼────────────────────────────────────────────────────────────────────────────────────────────
   1   │ 
   2   │ #ifndef _TEST_LIB_H_
   3   │ #define _TEST_LIB_H_
   4   │ 
   5   │ extern int print_integer(int integer);
   6   │ 
   7   │ #endif
───────┴────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────
       │ File: lib.c
───────┼────────────────────────────────────────────────────────────────────────────────────────────
   1   │ #include <stdio.h>
   2   │ 
   3   │ #include "lib.h"
   4   │ 
   5   │ int print_integer(int integer) {
   6   │     return printf("%d\n", integer);
   7   │ }
───────┴────────────────────────────────────────────────────────────────────────────────────────────
───────┬────────────────────────────────────────────────────────────────────────────────────────────
       │ File: test.c
───────┼────────────────────────────────────────────────────────────────────────────────────────────
   1   │ #include "lib.h"
   2   │ 
   3   │ int main() {
   4   │     print_integer(666);
   5   │     print_integer(42);
   6   │     print_integer(-5);
   7   │     return 0;
   8   │ }
───────┴────────────────────────────────────────────────────────────────────────────────────────────
[user@optiplex /tmp/mir-with-libs]$ gcc -c -fPIC lib.c
[user@optiplex /tmp/mir-with-libs]$ gcc -shared lib.o -o liblib.so
[user@optiplex /tmp/mir-with-libs]$ c2m test.c -o test -L. -llib
[user@optiplex /tmp/mir-with-libs]$ chmod +x test
[user@optiplex /tmp/mir-with-libs]$ MIR_LIB_DIRS=$(pwd) MIR_LIBS=lib ./test
666
42
-5
[user@optiplex /tmp/mir-with-libs]$
rempas commented 5 months ago

That's so awesome!!!! Your work is amazing! I'll leave this issue open until Vladimir merges your pull request!

Now, we need system call support and been able to create libraries and MIR is set for a general purpose backend!!! You guys are awesome!!! You give my motivation to work on Nemesis and hopefully have it ready at some point!

Wiguwbe commented 5 months ago

Regarding libraries, or modules (which could probably be another issue), and here I'm assuming these libraries will also be output in MIR,

There are a couple of ways that could be implemented (from the top of my head), depending as well on how you decide to implement them in Nemesis:

One way would be to have a custom C function that takes the name of the library (lib1.bmir), reads the file (searches somewhere predefined) into the MIR context and links it. The function can be directly implemented in the loader (a variant of the mir-run.c) or pre-loaded from a library (using the MIR_LIBS env var). The import_resolver should be able to find it (if in MIR_LIBS) or an extra else if in case of mir-run modification.

I guess this could be the easier option as you can create a wrapper script for binfmt_misc that sets the environment variables and then calls mir-run.

Then you have other options such as:

I hope that's useful, best of luck

rempas commented 5 months ago

Regarding libraries, or modules (which could probably be another issue), and here I'm assuming these libraries will also be output in MIR,

There are a couple of ways that could be implemented (from the top of my head), depending as well on how you decide to implement them in Nemesis:

One way would be to have a custom C function that takes the name of the library (lib1.bmir), reads the file (searches somewhere predefined) into the MIR context and links it. The function can be directly implemented in the loader (a variant of the mir-run.c) or pre-loaded from a library (using the MIR_LIBS env var). The import_resolver should be able to find it (if in MIR_LIBS) or an extra else if in case of mir-run modification.

I guess this could be the easier option as you can create a wrapper script for binfmt_misc that sets the environment variables and then calls mir-run.

Then you have other options such as:

* an array of (BMIR) libraries to load, set as a module-level data item (array) that the custom loader shall read and link to MIR context;

* if the function/variable you're accessing from has the format `<package>.<package>.<library>.<funcname>`, a custom `import_resolver` could find the BMIR file from the prefix, read & link (a bit trickier I guess)

I hope that's useful, best of luck

Thank you so much for the information. However, if I am to use libraries, I'll wait until and if Mir gives native support for them (think how ELF has one). Until then, I'll keep working in the frontend and wait!