remko / waforth

Small but complete dynamic Forth Interpreter/Compiler for and in WebAssembly
https://mko.re/waforth
MIT License
491 stars 27 forks source link

Declaring more locals when using the `CODE` word #50

Open nbonfils opened 1 year ago

nbonfils commented 1 year ago

Hi remko,

First off, thank you for this very nice piece of software. I've been struggling for weeks to get anything like emscripten/llvm setup on my OpenBSD computer running without any success. I then came accross your project and a simple npm install gave me the possibility to develop with forth on the browser directly and without any issues! Just for that I am grateful!

Background

Anywho, I am currently porting the uxn vm to the web using WAForth and so far I've pretty much succeeded, one issue is that as soon as I run slightly more involved uxn programs performance becomes a big issue, something like the game of life in a 64x64 grid takes more that 2sec to compute. My forth code probably isn't the most optimized, but my idea was to leverage the CODE in order to write the performance critical piece in raw wasm.

Issue

I've had success creating small words that loop and fetch parts of the memory, but now I want to do a slightly more involved word that will require more than 1 local (which seems to be the default with CODE and :), I am not sure how to do that, I checked the WAT code for WAForth and I guess you do something with the $firstTemporaryLocal and $lastLocal globals, modifying those might give me access to more locals, but I have 2 issues with this:

  1. I have no idea what the index of those globals are?
  2. Even if I could modify them, I think I'll only have more i32 locals, what if I want a v128 local?

Note: I am aware the CODE word is highly experimental, I just found myself really needing it for my project, so if I can help out figuring out how to provide a way to write raw wasm with WAForth, it'll be my pleasure! Note2: If you want to see the kind of performance issue we are talking about: https://demo.blazebone.com Warning: this will use a lot of CPU as it's pretty inefficient

Thanks again for the work you've been putting on WAForth!

Cheers, n

remko commented 1 year ago

Hi n,

Your demo looks awesome, I'm definitely interested to hear more about what you're doing (and how).

The better your Forth style (many small words), the worse your performance probably gets. Main culprit for the performance problem is probably the indirect calling between compiled words. I hope to one day add a step to waforthc to replace all indirect calls by direct calls, which should hopefully help (still need to do some timing tests to confirm that). Inlining is also theoretically possible, but trickier (since the locals have to be offset etc.).

Anyway, the issues you mention about CODE is the reason why it's still experimental: I haven't really thought it through yet (although there may not be that much to think through). Your use case is definitely a good driver for thinking about this.

Apart from the top of stack (local 0) (and depending on the type of word, the data pointer, not applicable for CODE words), locals (and the counters) are only used by the compiler for control flow. So, if you don’t use any Forth control flow in your CODE word (which probably is the case), just setting the number of locals should work. I’m not sure if such a helper word in the core should set a watermark of locals (so every assembly word that changes a local just needs to pass the index), or set the actual value (but in that case, there needs to be logic that maintains the maximum local used in these assembly helper words, or the CODE needs to start with a declaration of how many locals there are).

As for other types of locals such as v128, I think this just means there needs to be an entry for other types of locals in the WebAssembly word header, and corresponding helper words for setting this value.

I put this on my top of stack to look at, but I can’t say when I’ll get to it.