vnmakarov / mir

A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR
MIT License
2.24k stars 145 forks source link

Could you support inline assembly using ("GNU as" && "objcopy") || NASM? #344

Closed rempas closed 5 months ago

rempas commented 1 year ago

Hello, how are you doing my friend? So, I've seen in #87 that you are seeing inline assembly as something who's value doesn't worth the effort for the code that will be added (and of course that time it will need to be implemented).

So, today I just found out about something! So, I want to output Assembly code and then use an assembler and linker to do the linking process. However, I was wondering on how will it be possible to implement JIT compilation of functions this way as while I can do the following:

  1. Create a new file (containing the generated function)
  2. Use GAS to create the object file
  3. Use A linker to create the final executable AS A LIBRARY
  4. Use "dlopen" to open the newly created library and call the symbol

That would (probably) work but it would be veeeeeeery slow to generate it! So, I made a research and I found another way!

NASM has the -f option that allows us to choose the file type. It can directly output an binary file! This file can then be loaded in memory and executed! I do have a theory about how this will work but it needs testing. Funny enough, I have tried to open the generated binary file and then use mmap to map it to memory and then execute this block of memory (can create a repo with the code and share it if you are truly interested) but it doesn't work. Of course, it's low level code so there may be many things that I'm doing wrong. But's let's suppose we'll solve this!

One problem might be that (from what I know at least), NASM only supports "X86_X64". For others platforms, GNU AS can be used. However, it cannot directly output a binary file. Don't worry tho cause objcopy is to the rescue! It is a little bit slower than just using NASM but it's better than nothing!

The commands will be the following (suppose we have a file called "test.asm"):

For the NASM version: nasm -f bin test.asm -o test_bin

For the GNU AS and objcopy version: as --64 test.asm -o test.o -O2 --strip-local-absolute && objcopy -O binary test.o test_bin

Again, I'm just wondering because I'm not exactly sure how MIR works under the hood. What do you think?

vnmakarov commented 1 year ago

Thank you for sharing your ideas.

The first approach you are describing basically used by GCC JIT. This is because GCC has no embedded assembler which can not produce code in memory. I don't know about nasm, but GAS is designed for processing fast big assembler files, for tiny files the GAS initialization will take the majority of time. Also minimal space for library is a machine page which is pretty big.

Using existing assembler simplify implementation of inline assembler, especially when new insns are permanently added for new CPUs (like x86-64 or aarch64).

Still implementing inline assembly with existing assembler is a big job. The assembly inline syntax is not simple, there are a lot machine specific operand constraints which are added from time to time in GCC/Clang and should be checked against MIR operands. I also have no clear picture how inline assembly should look on MIR level (if we use GNU C inline assembly extension).

I am not rejecting idea to implement inline assembly. Simply I am not still ready to do this and did not decide what approach to use.

rempas commented 1 year ago

I'm glad that you continue been a positive and cheerful person. Of course, everything will need some works both on designing and on implementing it but after that MIR will probably be the 3rd best IR (or 1st if you consider compile-time to run-time worth ration) and most importantly, it will be able to be used by low level languages like my Nemesis that don't want to use libc's functions. For that reason, it's going to get lots and lots of more attention and help from other users that know how to design and implement a compiler backend.

At this point, I'm working for the frontend of Nemesis and I may not work with it for some time so I may be lucky enough that you would have start working in the inline assembly when I truly need a backend end, hehe!

When you start been interested on working on it, you can of course let me help if you want! I can help you with the design side of things (in both syntax and everything else). When that time comes, add a reply on that issue, letting me know and I'll give you my email so you can contact me!

edubart commented 5 months ago

I also think that an inline assembler is not aligned with MIR goals of keeping small. However I think MIR could have a way to inject binary blobs of code, like I suggested in https://github.com/vnmakarov/mir/issues/195 , so just skip the assembler, doesn't look like it would take much work. Maybe there is already a way to do this.

iacore commented 5 months ago

@rempas maybe you want to use keystone. To MIR, externally generated code are all the same. If you generate machine code with C calling convention, you can use it with MIR, probably.

rempas commented 5 months ago

@rempas maybe you want to use keystone. To MIR, externally generated code are all the same. If you generate assembly with C calling convention, you can use it with MIR, probably.

Thanks for the information!