Closed Dudeplayz closed 1 year ago
That's a good question.!
One challenge with AssemblyScript is that the code can no longer be easily extended or modified from the JavaScript realm, and communication between JS and the AssemblyScript code can be quite costly, so we'll want to keep it down to minimum. For starters, I'd suggest trying to create a version of the demo project that works with AssemblyScript, so we can compare the performance.
If AssemblyScript does bring a notable performance improvement that can't be achieved by tuning the TS code, we can probably find a way to include an AssemblyScript binary in the releases, either in this repo or a dedicated repo. Once we have more data about the performance we can consider the best course of action.
This sounds like a good plan. I will take a look at it!
Thanks!
Another idea that I had would be to come up with a AVR → WebAssembly compiler, that is to convert the raw AVR binary into WebAssembly code that does the same, so we don't have to pay the overhead of decoding the each instruction as the program is executing, and perhaps the JIT will be able to do a better job at optimizing the generated code.
Oh wow. This sounds really interesting! The question would be, how to integrate the peripherals.
That's a good question. I'd imagine having a bitmap or so that will indicate which memory addresses are mapped to peripherals. Whenever you update a memory address, you'd check in the bitmap. If it has a peripheral mapped to it, then you'd call a web-assembly function that will resemble the writeData()
function that we currently have....
That should work. The peripherals will be in WebAssembly too? In this case, everything would run in WebAssembly, expect the visuals which have to stay in JavaScript. But then we have again the problem, that all peripherals have to be converted to WebAssembly :/.
We could also mix-and-match. I believe stuff like the timers, which has to run constantly (in some cases after every CPU instruction or so), will have to live in the WebAssembly land. But some other, less frequent peripherals could possibly bridged over to JavaScript land.
There's definitely much to explore here...
The thing is, it always depends on the workload. So yes it is really interesting, but we need to start at some point with the exploring. Maybe we will see any point where we can say, we are already faster than ever expected. On the whole, you can say the CPU which is simulating will be always a lot faster than the simulated MCU. The gain of optimization is still for slow end devices.
So, should we create a plan where to start with testing and exploring the possibilities each approach has?
Yes, and as you say, it's good to have some baseline to compare to.
Right now, the JS simulation has some thing that can already be improved (e.g. the lookup table for instructions), and it runs pretty okay on modern hardware - achieving between 50% speed on middle-range mobile phones and 160%+ speed on higher-end laptops.
However, lower end devices (such as Raspberry Pi) only achieve simulation speed of 5% to 10%.
So there is definitely room for improvement, especially if we consider the use-case of simulating more than one Arduino board at the same time (e.g. two boards communicating with eachother)
Ok yes. For some playgrounds etc. it would be really funny and interesting to have multiple Boards running at once. So only if this case will be locked to higher-end devices, we need to improve it. I also had some interesting ideas with simulating such boards with node.js and other JavaScript runtimes.
I am actually not familiar with the benchmark code. Is the benchmark possible to run under all mentioned approaches or do we need a new benchmark to make a meaningful comparison?
The current benchmark is pretty minimal - it runs compares a single-instruction program many many times to compare different approaches for decoding.
I think a better benchmark would need:
I'd probably start with just the 1st, simpler benchmark, to get a feeling if the direction seem promising, and if it is, then we can devise a more extensive benchmark that will allow us to do a comprehensive comparison.
What do you think?
Starting with 1st should be the best approach. I would start looking at AssemblyScript or at WebAssembly directly? With WebAssembly directly we also have the decision between AVR Instruction translation and full interpretation (like JavaScript) in WebAssembly.
Maybe we can focus on some most needed Assembler instructions to reduce the amount of initial instructions and to focus the benchmark on those. So we can faster see first results and decide after, where we dig deeper or if we can already see a clear winner?
I believe that Web Assembly interpretation (written in C or RUST) wouldn't be much different than AssemblyScript, but it's pretty easy to write one or two instructions, as you suggest, and compare the generate WAT (Web Assembly Text) between the different implementations. Ideally, if there is no significant difference, using AssemblyScript means we can probably keep one code base which is preferable.
Here are some useful resources:
@gfeun (author of #19), you may find this discussion interesting :)
Hey, i've been summoned. Yeah, i'm following along :smile:
Yes, I agree. If there is not a really breaking point which requires to translate the TS code, we should stay at TS and use the AssemblyScript compiler.
The Web Assembly Studio looks great. I remember to have found it before in my researches. If you can wait, I will try to create a first comparison between the Rust, C and AssemblyScript approach in the next 2 days. These languages are not my preferred ones, so I have to get familiar with the tooling 😅. I will try to create some sharable Web Assembly Studio workspaces and post them here. (And yes, the generated wasm code is visible.)
If we can get the point with translating the AVR binaries to WebAssembly we can overcome the 30 WASM instructions. I think this project has some really interesting possibilities to get the best out of it. If we can get a good combination of all these possibilities we get the opportunity to support a really wide range of end devices and possible runtimes! Also, 1000MIPS is for a modern CPU no problem. Also, an RPi 4 could maybe run at something around 70-100% or even more. (measured on some synthetic benchmark)
And hello @gfeun. Nice to meet you!
Yes, definitely, sounds like a good plan. It's going to be interesting :)
Hey, I've created 3 WebAssembly Studio (WAS) workspaces for C, Rust and AssemblyScript: C - https://webassembly.studio/?f=u627pgs1r5q Rust - https://webassembly.studio/?f=h80i9yrgjxa AssemblyScript - https://webassembly.studio/?f=j5dlh1kn5s
The code and folder structure is based on the empty template workspaces for each language. I've tried to bring all together in one workspace, but this is more complicated.
For the start, I have implemented in each of them the same basic functions with an identical file structure. All wat
(WAS transforms them transparent to and from wasm
) files are looking basically the same. Except of some different orderings and the funny point, that Rust and C are swapping the variables for the add
function.
Surprisingly the AssemblyScript version is the smallest, measured on the line count. Rust and C have
this line (table $T0 1 1 anyfunc)
additional, which tells me nothing because I currently can't understand the file scheme. There is also a difference for Rust and AssemblyScript in the line (memory $memory (export "memory") 17))
, where AssemblyScript has 0
instead of 17
.
C also has a few lines more, which are maybe only for meta informations, which are left by the others.
I know, the example functions are not really hard to interpret differently. So can you take a look at it and tell me what you think? What would be a good example function to implement to find out something more meaningful?
Currently, I have the opinion, that even if AssemblyScript could have at some point an outbreaking performance difference, we would always have the possibility to implement this specific feature in a different language.
(To see the wasm
/wat
code, run Build
in the WebAssembly Studio)
For some reference, you can access https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format.
Oh, wow. I just realized now that all compilers have pre-evaluated the method calls in the main
function 😅.
Yes, the compile seems to be very good at optimizing - it inlines the functions and then also precalculates the result of the expression. Pretty smart!
So far, it seems like for basic arithmetic, we get roughly the same amount of opcodes. What I'd try to do next is to implement a complete opcode, e.g. define an array of hold the program data, and then implement an opcode that also reads and updates the data memory, such as ADD
(the registers reside in the data memory, and it also updates the SREG register, so that would be a memory write).
Does that make sense?
I would say it makes sense 😄. I will try it first with the AssemblyScript version and do some copy and paste of the original code. After that, I will convert it to the other both.
So for correct understanding:
Indeed, we'd need some basic opcode decoding to extract the target registers out of ADD
. Also, the existing TS code can probably be used without too much changes...
Ok, I will do so. I would create an additional repo/project for this testing code? I also moved from WebAssembly Studio to IntelliJ (or any other IDE).
I had tonight another idea: Would it be possible to add some async command prefetching to reduce the time for opcode decoding?
Yes, we can either create a different repo (if the code is entirely different), or a new branch here, in case the code is still the same (or auto-generated from the current code, like I did with the benchmark).
As for IDE, that makes sense. I use VSCode for this repo, so if you open it with VSCode you should get a list of recommended extensions (prettier, eslint, etc).
What do you mean by async command prefetching?
I currently only mean for testing things. So the example AssemblyScript, Rust and C code. For the "real" implementation I would prefer a different branch with the target to bring it to master.
I will try to get warm with VSCode for dev purposes ;D.
We currently discovered the problem with the big if-else statement. The idea would be to evaluate this statement for the next opcode async. But this requires to bring the code in a format where this is possible.
Yes, it makes sense to create a new repo for experiments. Would you prefer to have it under your github user or here (under the wokwi organization)?
Not sure how you'd go about evaluating the next opcode async, maybe I need to see an example to understand?
I think because it is related to this project, it should stay under wokwi. Also because if I copy some code. And maybe there will be some more experiments in the future.
Ok, I think I see the misunderstanding. I do not mean the full evaluation only the "prefetching" of the next operation to do. But currently are all operations under their specific if-statement. A refactor in their own methods would be required. In this case, we can evaluate the next command lookup async and for example, save the required function in a variable and the "running" code then only needs to evaluate this method. Instead of evaluating the big if-else block and evaluate after. It would be something like a 2 stage command pipeline.
But this would only bring an improvement if the decoding stays the same. If the decoding is rebuilt with look-up tables or something else this would bring only an (I think) small improvement.
That makes sense! What would you like to call the new repository?
Refactoring of the code to separate methods is already done by the benchmarking code, if you run npm run benchmark:prepare
, it creates a new instructions-fn.ts
file that contains one function per instruction and a lookup table at the end. Here is an excerpt from that file:
export function instADD(cpu: ICPU, opcode: number) {
/* 0000 11rd dddd rrrr */
const d = cpu.data[(opcode & 0x1f0) >> 4];
const r = cpu.data[(opcode & 0xf) | ((opcode & 0x200) >> 5)];
const R = (d + r) & 255;
cpu.data[(opcode & 0x1f0) >> 4] = R;
let sreg = cpu.data[95] & 0xc0;
sreg |= R ? 0 : 2;
sreg |= 128 & R ? 4 : 0;
sreg |= (R ^ r) & (R ^ d) & 128 ? 8 : 0;
sreg |= ((sreg >> 2) & 1) ^ ((sreg >> 3) & 1) ? 0x10 : 0;
sreg |= (d + r) & 256 ? 1 : 0;
sreg |= 1 & ((d & r) | (r & ~R) | (~R & d)) ? 0x20 : 0;
cpu.data[95] = sreg;
cpu.cycles++;
if (++cpu.pc >= cpu.progMem.length) {
cpu.pc = 0;
}
}
Good question :D. Something like WebAssembly-(Test/Comparison/Examples) or more abstract and related to avr8js? Do you have any suggestions?
Ah ok. That looks great. So is there already a look-up table now? Have you refactored this by hand or is there a script that does it for you?
How about avr8js-perf
, avr8js-wasm
, or avr8js-research
?
Yes, this script does the magic, a lot of text parsing :-)
There is also a preliminary implementation of the lookup table, I just never got around to measuring how it performs and integrating this into the main code.
avr8js-wasm
and avr8js-research
sounding great. But I would prefer research because
avr8js-wasm
looks like an alternative implementation to the currentavr8js-wasm
would be more matching.Ok, nice. Genius to solve it this way!
So the baseline is standing.
Let's start with avr8js-research then, we can always rename it later if we decide to
You can join the new repo here: https://github.com/wokwi/avr8js-research/invitations I made it public, is that ok?
Yes, it's absolutely fine 👍.
Should I try to get all three languages into one project? So additional comparison code would be possible.
Awesome! I'd suggest to start by establishing the baseline in AssemblyScript, and then if we feel like there's a room for improvement, we'll see how to add either Rust or C++, so we don't have to spend the overhead of setting a project that supports all the languages right now.
Yes, this was my intention. I'm currently trying to get the exported code from WebAssembly Studio running in my IDE, but there are some pitfalls. I will try to set up a clean and empty project. The intention to hold the WebAssembly Studio overhead was to share it there. But with this repo I think this can be safely dropped.
Yes, I think the main advantage of the Web Assembly Studio is that you can get quick results, just like you did, and then, once you decide to go in a specific direction, spend the time needed for the local setup.
On a different topic, I don't know if you noticed that, but we have a small AVR assembler in this repo, which can be quite useful when writing small programs for the benchmark. I originally created it for the TWI Interface tests, so I could test the example code from the datasheet and quickly tweak the program under tests without having the manually translate the instructions...
Yes, I agree.
I noticed it but was wondering what the purpose of it is. So the assembler is just a simple compiler? I will take a look at it. For the starting with the simple ADD
command, this should be to much :D.
I have made an initial commit to the repo. The question is now, how do we want to run the tests? With some visual output or only with node?
Yes, the assembler just translates the instructions from something readable into AVR machine code. I also started using it for the single instruction tests, as it makes things more readable.
I don't think visual output is a requirement right now. As long as we have the numbers, we can always put them in Excel later and produce some visuals if we want
Yes, this sounds reasonable.
Ok, so I will set up a node environment.
Here is an impressive demo of some WebAssembly compilations. If you turn up the passes then you will see, that AssemblyScript is performing the best of all!
Where do we want to track incompatibilities and experiences with AssemblyScript? For the incompatibilities, we will later need separate issues to fix them. We could also track them in the research project. Either with separate issues or for the start with a collection issue.
What do you mean by incompatibilities?
I have already found some things which are not working with the current implementation. For example the explicit declaration of types. And the problem that interfaces are not handled correctly, eg. the ICPU in CPU.
I see, let's have a different branch that makes the necessary changes, so we can later turn this into a PR if we decide to go down this path
Sounds good. I will continue with benchmarking and testing before we start to do something in the project. Now back to the question where to store these experiences?
By experiences, you mean a log of your findings? If so, I think a github issue can be a good place, or we can also use the GitHub Wiki for that
Yes, correct. In this repo or the other?
Hey Uri, We have already spoken about the possibilities to speed up the simulation. I am interested in adding support for AssemblyScript. I followed the recent discussions and speed ups. The question is if you still think it can bring some performance gains, also after the enhancements that were made?