scratchcpp / libscratchcpp

C++ library for building Scratch project players
Apache License 2.0
24 stars 8 forks source link

Drop bytecode VM and compile to LLVM IR #574

Open adazem009 opened 2 months ago

adazem009 commented 2 months ago

This player is going to be feature-complete soon, so I should start thinking about what to do with the current bytecode based virtual machine. It wasn't the best idea because it isn't fast. Scratch code can be compiled to LLVM IR instead which is then compiled to the architecture specific machine code by LLVM.

If anyone has experience with this, feel free to comment with any ideas :)

Notes:

How will it work?

Progress:

adazem009 commented 2 months ago

@Arcnor Just letting you know I've opened the issue to work on this.

Arcnor commented 2 months ago

@Arcnor Just letting you know I've opened the issue to work on this.

Hey @adazem009 , that's fantastic, and thanks for keeping me posted :). It makes sense not wanting to write a JIT, it's not an simple task and very easy to fall down the rabbit hole :D. But there are some implementations you can use IIRC. There are also other targets that are not LLVM that might help you as well, there are things like LibJIT that provide you a big base to construct upon. Finally LLVM itself has ORC, and the tutorial covers this, although I've never used it (nor the predecesor, can't remember its name) and it's not an actual JIT implementation but "hooks" to write your own.

Anyway, I wish you luck!

adazem009 commented 2 months ago

@Arcnor Just letting you know I've opened the issue to work on this.

Hey @adazem009 , that's fantastic, and thanks for keeping me posted :). It makes sense not wanting to write a JIT, it's not an simple task and very easy to fall down the rabbit hole :D. But there are some implementations you can use IIRC. There are also other targets that are not LLVM that might help you as well, there are things like LibJIT that provide you a big base to construct upon. Finally LLVM itself has ORC, and the tutorial covers this, although I've never used it (nor the predecesor, can't remember its name) and it's not an actual JIT implementation but "hooks" to write your own.

Anyway, I wish you luck!

Thanks. I mostly think it doesn't make sense to write a JIT because (probably) all Scratch projects compile very quickly with the current compiler, only JSON parsing (to generate AST) can take a long time. I just hope compiling with LLVM won't slow it down.

I'll most likely use LLVM as long as I can call C++ functions from the IR and write unit tests for blocks easily (easier than it is now, with the complicated bytecode VM).

davidtheplatform commented 2 months ago

(reply to a discord message, but I'm putting it here since its relevant) The way I set up threespeed is theres a python script that converts a project.json file to c++. This conversion is basically 1:1, ie. each scratch block becomes a single function call to the corresponding C++ implementation. This makes writing the compiler pretty easy. I also decided that each block stack would get its own native thread. This was probably a bad idea since most performance-intensive projects are mostly single threaded anyways and it comes with all the normal multithreading problems.

If you decide to compile the code instead of making a bytecode interpreter, the main problem you'll run into is how scratch handles reentrancy. If you have a long-running event handler and it gets called again, the first instance immediately stops. With a bytecode interpreter this is easy to handle, for native code it isn't since this isn't how most languages handle reentrancy. You might be able to solve this in an elegant way if you don't just generate function calls, although it sounds like you're going down a similar path as what I did.

As for compile time, the sb3 -> c++ compiler written in python is much faster than gcc. The project.json is already an AST and you have to parse at most 5mb of data for most projects.

adazem009 commented 2 months ago

(reply to a discord message, but I'm putting it here since its relevant) The way I set up threespeed is theres a python script that converts a project.json file to c++. This conversion is basically 1:1, ie. each scratch block becomes a single function call to the corresponding C++ implementation. This makes writing the compiler pretty easy. I also decided that each block stack would get its own native thread. This was probably a bad idea since most performance-intensive projects are mostly single threaded anyways and it comes with all the normal multithreading problems.

If you decide to compile the code instead of making a bytecode interpreter, the main problem you'll run into is how scratch handles reentrancy. If you have a long-running event handler and it gets called again, the first instance immediately stops. With a bytecode interpreter this is easy to handle, for native code it isn't since this isn't how most languages handle reentrancy. You might be able to solve this in an elegant way if you don't just generate function calls, although it sounds like you're going down a similar path as what I did.

As for compile time, the sb3 -> c++ compiler written in python is much faster than gcc. The project.json is already an AST and you have to parse at most 5mb of data for most projects.

These things (thread/script management) are currently handled by the Engine class (https://github.com/scratchcpp/libscratchcpp/blob/537ff52a7500d17d07df7ea31c7023358824e344/src/engine/internal/engine.cpp). I just need to make the native code be able to stop, so it can resume later. That's probably the trickiest part. I'm not sure how that's going to work yet.

adazem009 commented 2 months ago

I found this, but I'm not sure it's related: https://llvm.org/docs/Coroutines.html It seems I can make a coroutine suspend and then resume, but I don't know if I can do that with the main function as well.

adazem009 commented 2 months ago

@Arcnor After some experiments with LLVM, I think I've figured out how to be able to suspend and resume compiled code (read @davidtheplatform's comment). See the "How will it work?" section of the issue description.