Add support for AssemblyScript

Dudeplayz commented 4 years ago

Hey Uri, We have already spoken about the possibilities to speed up the simulation. I am interested in adding support for AssemblyScript. I followed the recent discussions and speed ups. The question is if you still think it can bring some performance gains, also after the enhancements that were made?

urish commented 4 years ago

I think this repo makes more sense, to keep all the documentation together

Dudeplayz commented 4 years ago

Ok, I think for the implementation purposes we should take an issue and later we can document it in the wiki. We should start with a collecting issue and later when we are going to implement them we create related issues. We also could write them in this issue, but this would blow it up much more.

Dudeplayz commented 4 years ago

Hi, I am sorry. I only want to let you know, that I am on it. My time is currently, well I would say, disorganized and a little bit rare. I will try to come up with the next results at the end of the week.

urish commented 4 years ago

Thanks for the update!

I'm also looking at some optimizations. For instance, #38 will probably be able to reduce the .tick() calls for peripherals, and I also have some thoughts on optimizing the timers, so their .tick() calls won't have to run after every single instruction (it doesn't make sense, if you have a large prescaler, for instance). I will probably create a new issue about this over the next few days

Dudeplayz commented 4 years ago

Sounds promising and should reduce some overhead 👍.

Dudeplayz commented 4 years ago

@urish Currently, all number types reference to number. In an AVR there would be some overflows etc.? Is this handled at the moment properly or isn't it even possible with the current implementation?

urish commented 4 years ago

In specific cases we do have specific handling to make sure we don't overflow, e.g. here, and also in some places in the timers code, if I'm not wrong.

Dudeplayz commented 4 years ago

Ok, is it necessary to do some extra research if such overflows can occur?

Dudeplayz commented 4 years ago

And another question. Because the part of adding WebAssembly support is part of my bachelor thesis, I have to implement and design it by myself. This is necessary to be sure, that it is my work. Talking about interfaces and the integration is fine, but the code has to be done by me. After I completed that there is no problem to modify it. I hope you will not have a problem with that?

urish commented 4 years ago

@Dudeplayz sounds good, are you okay with releasing your work under the MIT license (same license as AVR8js)?

Dudeplayz commented 4 years ago

Yes, this is no problem. Thanks for your understanding 😄.

urish commented 4 years ago

About the overflows - I don't think we need to pay special attention to it right now. The tests should cover the common cases, so if the code passes the tests we can assume we're good

Dudeplayz commented 4 years ago

@urish Hey Uri, I am sorry that I haven't replied for so long time. I had some very breaking changes in my personal and private life. There wasn't room or time for proceeding with the work. I am now reordered and will start to complete the WebAssembly target. My interest in the project hasn't lowered and I want to get this done. My purpose is to make fast progress now. Sorry again and I hope you're fine with that.

urish commented 4 years ago

Great to have you back @Dudeplayz, and thanks for the update!

urish commented 3 years ago

Closing for now. @Dudeplayz if at any stage you feel like picking it up again, just ping me. If not, fine too!

Dudeplayz commented 3 years ago

Hi @urish. I'm currently in the process of restarting my thesis. So I will soon have information on how to continue. A planned feature is some debug functionality with e.g. ELF files. Have you anything planned like this or already thought about?

urish commented 3 years ago

Hi Dario! Welcome back :-)

There's a debug functionality with GDB that runs in the browser, though a way to read the debug information from the ELF files directly (without having to go through the whole GDB setup) would definitely be nice!

Dudeplayz commented 3 years ago

Thanks :) !

I've read your blog post. First of all, great work! Can't imagine how much time you have already spend on this whole project. If I understand you right, you mean to enable debugging without the need of GDB? Something like this would be really user friendly because it would allow to debug directly in the source code editor.

urish commented 3 years ago

Thank Dario! Yes, that'd be amazing. The ELF files contain something called DWARF, which contains the debugging information. As far as I remember, there's an instruction set that maps each symbol to a memory location depending on the current program position.

So for instance, a variable value may live in a register at one place, then copied to the stack a few instructions later. So DWARF includes instructions for the debugger how it can find the value of the variable in each relevant location.

Dudeplayz commented 3 years ago

Hi Uri!

Thanks for the hints and infos.

The roadmap stands. Firstly, I will retry my efforts to compile it to WebAssembly. The next thing is debugging (which could have some troubles if compiled to WebAssembly). And the last thing is the integration in the university system.

urish commented 3 years ago

Sounds like a plan. I'm here for anything.

Just a quick note about Web Assembly:

A few months ago I run a quick experiment: I wrote a piece of code that translates AVR machine code directly into JavaScript.

I tested it on the following "blink" program:

int main() {
  DDRB = 0xff;
  PORTB = 0x20;
  while(true) {
    for (volatile long i = 0; i < 1000000; i++);
    PINB = 0xff;
  }
}

which was translated to translated-led-demo.js.

It runs ~2.5 times faster compared to the current simulation engine (at least on my machine):

https://user-images.githubusercontent.com/892318/123870474-0b406700-d93b-11eb-83ee-23f794be8ad1.mp4

To compare, standard avr8js simulation of the same program runs here at ~480%.

It doesn't scale however. For larger programs such as Arduino's Blink, the generated program gets enormous (as in 60,000 LoC), and then the V8 turbofan engine no longer optimizes it, and we get the simulation speed down to 3%.

But at least it shows that for this specific case, even translating the AVR code into JS without applying any special tricks makes it run considerably faster. And WASM will probably be faster (and maybe also without the function size limit).

Then, I found an amazing article which explains how we can tame the control flow to optimize this even further: https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2

Dudeplayz commented 3 years ago

Ok, sounds great. So you mean compiling the AVR Code directly to WebAssembly? I was always thinking of the way to compile the AVR8js simulator to WebAssembly to accelerate it. But the other way is also a very interesting idea.

urish commented 3 years ago

So you mean compiling the AVR Code directly to WebAssembly?

Exactly. I believe this is where the real speed up hangs, since this means that browser can then translate that code into native machine code. So you indirectly get the AVR Code translated into native machine code...

Dudeplayz commented 3 years ago

Ok, now we talk about the same thing :D. This would be the untoppable option. The other problematic thing is, how to communicate with the outworld and the peripherals in an efficient way. I need to connect some protocolls to the peripherals for the remote controlling. Also for visualization it's neccessary.

urish commented 3 years ago

The other problematic thing is, how to communicate with the outworld and the peripherals in an efficient way.

True. I believe the most performance-intensive peripheral is the timer - in some cases, it will run on every single clock cycle. So it might make sense to try to compile it using AssemblyScript to WASM, and therefore eliminate the need to go back-and-forth between WASM code and JavaScript, at least for this peripheral. But the only way to truly know is to profile.

Or did you ask about something different?

Dudeplayz commented 3 years ago

So it might make sense to try to compile it using AssemblyScript to WASM

A hybrid approach would be good. Then can nearly everything be converted to WASM.

But the only way to truly know is to profile.

Yes, trying and profiling.

Or did you ask about something different?

The simulated controller should control real life models. And therefore the Remote Lab has a own protocol to handle this real life signal and simulated signal. And to transfer the correct pin changes, these pins must be accessable in any way. I don't know, if network access is available in WASM. Then this could be solved by also converting this protocol to WASM. Otherwise this ping-pong between WASM and JS would also happen.

urish commented 3 years ago

Otherwise this ping-pong between WASM and JS would also happen.

The ping pong would be necessary anyway - WASM in general doesn't have any access to the outside world without help from JS. It can't do networking, UI, or use any of the many browser APIs.

But I don't think this would be a huge problem - as long as the ping-pong doesn't happen too often. Mixing WASM and JS used to be slow in the past, but at least in FireFox, this is no longer the case: https://hacks.mozilla.org/2018/10/calls-between-javascript-and-webassembly-are-finally-fast-%F0%9F%8E%89/

Dudeplayz commented 3 years ago

The ping pong would be necessary anyway - WASM in general doesn't have any access to the outside world without help from JS

Ok this was also my last stand.

as long as the ping-pong doesn't happen too often

This is again a thing of testing :D

Mixing WASM and JS used to be slow in the past, but at least in FireFox, this is no longer the case

Ok, wow. These are stats to work with!

Dudeplayz commented 3 years ago

Hey Uri. How is it actually handled if the simulation runs faster than in reality? This would be a problem if it is compiled directly to WASM because it can't be controlled easily.

urish commented 3 years ago

Hi Dario!

This is a "good problem to have", and in fact - it happens now too (you can see it if you run the demo project). However, there's an easy workaround: just run the simulation for ~250k simulation cycles for every frame (at 60fps). This seems to do the trick pretty well at wokwi.com.

Dudeplayz commented 3 years ago

Ok, so the limiting happens at the framerate, which is 1/60s, with 250k cycles. So how is this done and can this also be achieved with WASM?

urish commented 3 years ago

This is the basic algorithm:

deadline ← cpu.cycles + 250000 (or a similar number)
while cpu.cycles < deadline:
  execute next cpu instruction, updating cycles
  check if we need to run any peripheral callback

So WASM can export a function that gets the amount of cycles, and runs the code for that many cycles (more or less, it doesn't have to be super accurate), then yields control back to JS.

Dudeplayz commented 3 years ago

Ok. So the HEX files are compiled to WASM -> Resulting in a WASM program containing all the instructions of the program. Then a run-loop (the above) executes these instructions and doing any peripheral callback.

Is this correct? I'm feeling like hanging somewhere.

urish commented 3 years ago

Is this correct? I'm feeling like hanging somewhere.

Yes. Then, this may open the door for further optimizations (e.g. skipping costly flag calculation if the next instruction discards the flags anyway), but let's first see if we can get the basic thing going and how much faster it is.

Dudeplayz commented 3 years ago

Ok. Does this way need a rework of the CPU class? I think the first thing should also be trying to compile the project to WASM and then adding the compiling of the HEX files. Or do you think another way is better?

urish commented 3 years ago

I think the first thing should also be trying to compile the project to WASM

This may need a lot more changes than just emitting WASM code directly.

You may find the AVR to JS compiler experiment useful as a reference. If I remember correctly, I started from instruction.ts and did some massive editing there.

The code that decodes the opcodes and their args will probably stay the same, but the part where you generate the code will probably be different. probably something along the lines of:

  ...
  } else if ((opcode & 0xfe08) === 0xf800) {
    /* BLD, 1111 100d dddd 0bbb */
    const b = opcode & 7;
    const d = (opcode & 0x1f0) >> 4;

    /* something that will generate the WASM equivalent of 
        data[d] = (~(1 << b) & data[d]) | (((data[95] >> 6) & 1) << b);

        i.e. convert the following pseudo-code into WASM 
        temp1 ← data[d]
        temp1 ← temp1 & ~(1 << b)
        temp2 ← data[95]
        temp2 ← temp2 >> 6
        temp2 ← temp2 & 1
        temp2 ← temp2 << b
        data[d] ← temp1 | temp2

        where each line in the above code is probably a single or two WASM instructions, and `~(1 << b)` is actually a constant (because we know the value of b at compile time)
    */;
  ...

I hope this is helpful!

Dudeplayz commented 3 years ago

Thank you. I will take a deeper look at it later this week.

Dudeplayz commented 2 years ago

Hi @urish , I hope you are fine! I have made some progress in compiling the library into WASM. Unfortunately, the compiled code does not work the same as the vanilla library. I'm currently struggling because I can't find the reason for this problem. During my work, I had to change some of the code in regard to the missing types or the use of some high-order functions which are not available in AssemblyScript. I got also some trouble with the cross-coupling of the classes and the multiple classes in the single .ts files. My goal is to use the compiled WASM code + the glue code as a drop-in replacement of the current library. So it's a zero-cost performance boost for existing programs. So my question to you is if you can refactor the code in regard to these compatibility issues so that future updates don't break the WASM code.

Maybe you have some time, that we can talk about it in an online meeting?

Best regards!

urish commented 2 years ago

Hi Derio! Great to read you got some progress!

Feel free to book some time on my calendar at https://urish.org/calendar

urish commented 2 years ago

@Dudeplayz are you joining the call?

Dudeplayz commented 2 years ago

@urish I am on the way, 3 min. Sorry!

urish commented 2 years ago

Alright, I'm waiting :-)

Dudeplayz commented 2 years ago

@urish thanks for the talk this week. I have found the reason for the failing opcodes. It is mainly by an implicit type casting from u16 to u32, where u16 flips around and is then cast to u32, which doesn't flip around again because it can handle larger numbers. The discussion directly in AssemblyScript can be found here https://github.com/AssemblyScript/assemblyscript/issues/2131. It seems that I still found a bug in AssemblyScript. I already fixed the instructions, by typing the opcode directly as u32 and not u16. The test program executes now correctly. This has still to be tested with some larger programs. Maybe you have some, which you already used. Are all instructions are covered by the unit tests?

Dudeplayz commented 2 years ago

And here is a link for the portability which you have mentioned. The typecasting is then a little bit different.

Dudeplayz commented 2 years ago

I got it! The test program is now running without discrepancies. I had also to update the instruction.ts file, which I skipped due assumption it hasn't changed, but that was wrong. Here is the instruction.ts file with the applied fixes using the portability approach. If you like we can merge it into the main project soon. The made changes can be found here: https://github.com/wokwi/avr8js-research/commit/42f0b3934203134a51cc89e87a1869a117a3483e. The only thing I haven't tested yet is how to import the portability AS library in normal TS.

I am now trying to get the jest unit tests running, which are throwing some errors due to my node.js environment I think.

urish commented 2 years ago

Hi Dario!

Congrats on spotting the issue. Most of the instructions are covered by the unit tests, but not all of them. But I think, if the unit tests pass and Blink also eventually works, that's already a very good starting point.

Thanks for sharing the modified instructions.ts. I merged your changes into a working branch, as-interop. I added an implementation of the u16/u32 functions, so the code still compiles/runs correctly with typescript. See commit b81a21d. If it looks good for you, I'm ready to merge it into master.

Dudeplayz commented 2 years ago

I have some trouble getting the jest running with the assemblyscript/loader. I'm working on it to get the unit tests working. If I get it working or get stuck, I will try to get the timer working for testing the blink program.

I had a look, and the only thing I am not sure about is the import of the types file. The docs are not well describing how to get the portability working. They throw some statements at you and mention some projects where to look. Maybe we wait with merging the AS things until I have some more parts finished. I think we have to extend the AssemblyScript library or import it. I had a look at the portability class and the functions they provide are designed to transform any number in a way that describes the same behavior as in AS/WASM (overflows, trimming, etc.). So if we start with that it means that WASM would be the preferred/limiting target. For the first, you could merge it as it is and we see if we can substitute the types file later with the portability version.

It would be also nice if you could mention my contribution somewhere. Atm, it wouldn't be clear, as you do the commits. I hope that is ok :)

urish commented 2 years ago

I have some trouble getting the jest running with the assemblyscript/loader

If you get hung on it for too long, remember you can always run the tests without jest. You'd need to create some kind of expect function, and then you can strip "describe" and replace "it" with a console.log that prints the test name. Some work, but it might be a quick way to get a good feel about the tests.

Maybe we wait with merging the AS things until I have some more parts finished.. For the first, you could merge it as it is ...

I'm not sure - so do you advise to merge or wait?

In general, there are two parts which are very sensitive to performance: instructions and the count() function in the timers. If we introduce code that uses the compatibility library, we need to make sure it doesn't impact performance. My u8...u32 implementations are naïve, but they work here because the code doesn't assume that they affect the number in any way.

It would be also nice if you could mention my contribution somewhere. Atm, it wouldn't be clear, as you do the commits. I hope that is ok :)

Of course. I added a comment with your name at the top of instructions.ts. And if you feel like making the commit under your name - then sure, go for it. Then I can merge your commit in place of mine.

Dudeplayz commented 2 years ago

If you get hung on it for too long, remember you can always run the tests without jest.

Thanks for the hint! I will do this now. I was very busy the last 2 weeks so I had to pause a bit.

I'm not sure - so do you advise to merge or wait?

If you don't plan to work on the instructions in the next weeks, we can merge. Otherwise, we could get the problem, that you can't test the compatibility with AS until we copied/merged it in the research project to see if the compiler is fine with it.

Of course. I added a comment with your name at the top of instructions.ts. And if you feel like making the commit under your name - then sure, go for it. Then I can merge your commit in place of mine.

Ok. that would be nice, so I will do my own commit. I have still the problem, that I am not familiar with the Github Merge process 😅

So let's wait a bit until I got the tests running and I will try to create a Merge-Request.

urish commented 2 years ago

So let's wait a bit until I got the tests running and I will try to create a Merge-Request.

Sounds like a plan!

In general, merge can be done in a few ways. The most straightforward one is when your branch places all the commits on top of the last commit in master, then these commits are simply copied over. Otherwise, there are a few options when I merge:

Create a merge commit
Squash
Rebase

There's a nice book from @pascalprecht that explains this, in case you want to better understand the process: https://rebase-book.com/

wokwi / avr8js

Add support for AssemblyScript #35