Separate compiler from VM

hiperiondev commented 1 year ago

For embedded applications will be good to separate LD and IL compiler and allow to load precompiled file

kalamara commented 1 year ago

this is actually one of the first things i came across when i considered porting to freeRTOS. LD and IL compiler are already separated in the sense that you can load a .IL file. the IL compiler is really just a deserializer, it encodes the IL operands and operations and loads them in a struct in RAM but there is no compilation, the same instruction list that is in the file is also in the memory. But the real problem is that as it is now, you need a filesystem to load a file from and that is not always available in embedded systems. It's not so simple, you would need to somehow store the instruction list already binary encoded into a flash memory and load it from there. maybe make a plc_load_from_memory and use that one instead of plc_load_file. if you implement it i am interested to see.

hiperiondev commented 1 year ago

Yes, indeed ESP-IDF is a framework over FreeRTOS. I'll try to make some test with this enhancement

hiperiondev commented 1 year ago

In ESP32 I can use internal flash or external SD. Other uC can use external SD. You can see an example of internal flash use with LittleFS on: https://github.com/hiperiondev/esp32-berry-lang

hiperiondev commented 1 year ago

At first I think codeline_t code, at rung, should be optional. This consume a lot of memory

hiperiondev commented 1 year ago

I have decoupled compiler and made parsers and code rung optional. Now compiler can be constructed stand alone. Not do anything but is the first step :-)

hiperiondev commented 1 year ago

Now parse files ld and il. Do not generate vm file

For construct compiler: $ Config.sh $ cmake --build build-compiler

for use: build-compiler/librelogic-compiler -i input file -o output file

Works with ladder but not parse IL, still don't know why

https://github.com/hiperiondev/librelogic

hiperiondev commented 1 year ago

Found problem, dump rung not create compilable IL: not add % to arguments, ( modifier are separated by space

hiperiondev commented 1 year ago

Finished compiler. Can you test it?

kalamara commented 1 year ago

Finished compiler. Can you test it?

nice! i will try to find some time in the weekend, or monday

hiperiondev commented 1 year ago

I was thinking about a modification of the VM to generate a more compact code and faster execution. When I have a little time I'll try it, but the idea is the following:

// VM INSTR
//        [        INSWORD2 (25)          ]
//              [      INSWORD1 (21)      ]
//                     [   INSWORD0 (16)  ]
// [INSBYTE0][INSBYTE1][INSBYTE2][INSBYTE3]
// [IIIIICNP][RVXOOOOO][BBBBBBBB][TTTTTTTT]
// I: il instruction
// C: conditional
// N: negate
// P: push
// R: return
// V: byte:bit/var
// X: (not used)
// O: operand
// B: byte
// T: bit

#define INSBYTE0(x)  ((x & 0xFF000000) >> 24)
#define INSBYTE1(x)  ((x & 0x00FF0000) >> 16)
#define INSBYTE2(x)  ((x & 0x0000FF00) >> 8)
#define INSBYTE3(x)  (x & 0xFF)
#define INSWORD0(x)  (x & 0xFFFF)
#define INSWORD1(x)  (x & 0x1FFFFF)
#define INSWORD2(x)  (x & 0x1FFFFFF)

#define IL(x)        (x >> 27)
#define OPERAND(x)   (((INSBYTE1(x) & 0xF8) >> 3) + 20)
#define COND(x)      (INSBYTE0(x) & 0x02)
#define NEGATE(x)    (INSBYTE0(x) & 0x04)
#define PUSH(x)      (INSBYTE0(x) & 0x01)
#define RETURN(x)    (INSBYTE1(x) & 0x80)
#define VAR(x)       (INSBYTE1(x) & 0x40)

#define SET_IL(i, v)        (i | (x << 27))
#define SET_OPERAND(i, v)   (i | (x << 16))
#define SET_COND(i)         (i | 0x4000000)
#define SET_NEGATE(i)       (i | 0x2000000)
#define SET_PUSH(i)         (i | 0x1000000)
#define SET_RETURN(i)       (i | 0x800000)
#define SET_VAR(i)          (i | 0x400000)

hiperiondev commented 1 year ago

Boring Sunday, new vm :-)

Here compiler(assembler), decode instruction and vm skeleton https://github.com/hiperiondev/librelogic_newvm

kalamara commented 1 year ago

interesting, i haven't thought at all about optimizations as i haven't even made any meaningful memory usage or cpu profiling measurements yet. maybe your standalone compiler can be useful for that :)
is it working? i get the general idea but would like to see exactly how much better it performs in numbers. the goto loop with the labels is pretty cryptic to me. i normally avoid gotos and use switch / return statements instead, it perfroms the same but makes for easier code to understand and maintain i think.

kalamara commented 1 year ago

Finished compiler. Can you test it?

nice! i will try to find some time in the weekend, or monday

i tried it out, it works (if you have the program in the same folder, otherwise it crashes), i like the addition of .vm file format, maybe you could somehow add a flag to compile directly from .ld to .vm.

have you considered GNU getopt for easier command line options handling instead of custom code? no need to reinvent & maintain the wheel i think. this project has had this mistake many times in the past. overall it's a good addition i think, kudos :)

hiperiondev commented 1 year ago

interesting, i haven't thought at all about optimizations as i haven't even made any meaningful memory usage or cpu profiling measurements yet. maybe your standalone compiler can be useful for that :) is it working? i get the general idea but would like to see exactly how much better it performs in numbers. the goto loop with the labels is pretty cryptic to me. i normally avoid gotos and use switch / return statements instead, it perfroms the same but makes for easier code to understand and maintain i think.

There is a lot of literature and discussion about the advantages of computed goto and threaded code (and much more flame). It really depends on the architecture and the compiler (gcc has flags to optimize this technique). I did several tests and there is not much difference in a PC CPU but there is usually a lot in microcontrollers. Many VMs use it for that reason (Python uses it optionally with the HAVE_COMPUTED_GOTOS flag). It's not a huge modification but it can have a big impact on embedded architectures.

On the other hand, having an instruction format closer to a real CPU greatly reduces fetching times and op decoding only depends on instructions that are resolved by hardware (and, or, shift) so they are very efficient. The code is much more compact and allow some peephole optimizations

The compiler (actually an assembler) and the instruction decoder work correctly. The VM is just a skeleton, I have to add the real functionality of the rest of the project to it. But I already have experience in other CPU type VMs, it's not a complex problem.

hiperiondev commented 1 year ago

Finished compiler. Can you test it?

nice! i will try to find some time in the weekend, or monday

i tried it out, it works (if you have the program in the same folder, otherwise it crashes), i like the addition of .vm file format, maybe you could somehow add a flag to compile directly from .ld to .vm.

Is already possible compile from .ld to .vm. You can try it and see how it works. I can use path for file, not crash for me. Can you give to me an example?

have you considered GNU getopt for easier command line options handling instead of custom code? no need to reinvent & maintain the wheel i think. this project has had this mistake many times in the past. overall it's a good addition i think, kudos :)

Yes, of course. I have used this very simple library for concentrate in functionality.

kalamara commented 1 year ago

here a console output, crashing: ./librelogic-compiler -i ../../plclite/program.il input file: ../../plclite/program.il Loading code from ../../plclite/program.il...

LD %i0/2 AND( %i0/1 ) OR( %i0/0 AND( %i0/2 ) OR( %i0/1 AND( %i0/0 ) ) ) ST %Q0/0

Parsing IL code... rungs: 1

LD %i0/2 AND( %i0/1 ) OR( %i0/0 AND( %i0/2 ) OR( %i0/1 AND( %i0/0 ) ) ) ST %Q0/0

saving VM: /.vm rung 0: 12 instructions Segmentation fault (core dumped)

something is going on with the filename parsing, i think.

i did see it produces .vm output as well when the input is .ld, i overlooked that before

hiperiondev commented 1 year ago

here a console output, crashing: ./librelogic-compiler -i ../../plclite/program.il input file: ../../plclite/program.il Loading code from ../../plclite/program.il...

LD %i0/2 AND( %i0/1 ) OR( %i0/0 AND( %i0/2 ) OR( %i0/1 AND( %i0/0 ) ) ) ST %Q0/0

Parsing IL code... rungs: 1

LD %i0/2 AND( %i0/1 ) OR( %i0/0 AND( %i0/2 ) OR( %i0/1 AND( %i0/0 ) ) ) ST %Q0/0

saving VM: /.vm rung 0: 12 instructions Segmentation fault (core dumped)

something is going on with the filename parsing, i think.
* i did see it produces .vm output as well when the input is .ld, i overlooked that before

I will correct this week.

Question: Why any rung have his own stack and accumulator? It's really necessary?

kalamara commented 1 year ago

Question: Why any rung have his own stack and accumulator? It's really necessary?

the idea is that each rung could be processed in a different thread, so this decision felt natural. I haven't implemented such logic yet as you can see, it was something for the future. i haven't given much thought on if it is really necessary.

hiperiondev commented 1 year ago

mmm... every rung must be executed in order because a result of one can be used in other. I don't think rungs can be processed in different threads

kalamara commented 1 year ago

as the whole thing is right now this is true but what i wanted to do ultimately is make some implementation of Sequential Function Chart (i know, it's ambitious) where the right order of rungs is defined. Is it a problem to have each rung have its own stack? FYI the next thing i am planning to work on is Structured Text compiler.

hiperiondev commented 1 year ago

Not really a big problem, but more memory on constrained devices is not a good thing

hiperiondev commented 1 year ago

maybe you might be interested in this:

https://github.com/shadowofneptune/threaded-code-benchmark

kalamara commented 1 year ago

Thank you, that was quite an interesting read, i was completely unaware of the specific technique. So this benchmark tells us that computed GOTOs technique is a little faster than Tail optimization which is a little faster than switch-return.

My take on this, and please don't take it as a flame, is that the whole point of writing in C (or any programming language) and not assembly is to make the programs portable and easy for the human to read and debug. Therefore I would only resort to using GOTOs if the performance was completely unacceptable, otherwise i would sacrifice a little performance for a little readability, especially if the optimization was targeted towards a specific hardware platform. As i have mentioned before, i haven't made any performance measurements yet for the specific project, but this is definitely something to do soon(ish). However given that normally PLC cycles are few milliseconds i don't expect it to stress the CPU cycles. If i was going to optimize the VM i would try the switch-return method first, unless i felt adventurous and had the time to rewrite it to do tail recursion. I have used tail recursion in a different language (Scala) and it surely produces interesting results in terms of how the code reads.

If you want to make a custom VM implementation that is optimal to your platform i have no issue. I can not accept replacing my implementation with it though as that will make it hard for me to support in the future.

But i would advise that you first port the existing one, profile it, see if it performs "good enough" and then decide if it is worth optimizing, instead of falling prey to premature optimization as per the famous quote :). Thanks again

hiperiondev commented 1 year ago

I don't take your comment as a flame, on the contrary, I agree with you. First the project must be ported as it is working now (already done) and then optimized. Separate compiler is the first optimization, I think the second is correct any memory leak. Maybe the third could be to use a more compact code as I had proposed (initially converting it directly to the current internal format so as not to touch the rest of the project).

After all this you could think about optimizations to the VM (and also optional, as Python does)

hiperiondev commented 1 year ago

Ok, now let's go back to the compiler. I already corrected the path problem, I'm going to create the upload of the recompiled file

hiperiondev commented 1 year ago

Could you add a test to load a program and run it? I don't see any test that does it.

kalamara commented 1 year ago

will do, it might be a good opportunity for a little refactoring. no time this weekend unfortunately.

hiperiondev commented 1 year ago

No problem. Please do not refactor before I finish the standalone compiler, that would force me to rebuild a lot of what is already working

hiperiondev commented 1 year ago

I am finishing work on a project that could be integrated into this project: https://github.com/hiperiondev/iec61131lib It is a complete implementation of the data types defined in IEC61131-3, with dynamic assignment and type promotion. Some functions are still missing but it is almost complete.

kalamara commented 1 year ago

About the memory leaks: I did check them and fixed the most important ones. I wouldn't sweat over using valgrind on the test_vm program, that one is expected to produce many errors as it checks all the degenerate situations, uninitialized variables and the like. I have added a C implementation of a minimal PLC application (the same one that i have in python, plclite) to use for this kind of testing. Maybe this covers the need for the test you asked about loading files, too. I think you can use your standalone compiler to check memory errors as well. I will create a new release tag with these fixes, thanks for triggering this!

kalamara commented 1 year ago

A question: if you had an API that accepts FILE instead of file name to load the program file, say plc_load_program(FILE f, plc_t p); would that be useful for your FreeRTOS port? I was thinking to add this in order to decouple filename from file pointer but not sure if it is really meaningful. we can move this discussion to emails if you like.

hiperiondev commented 1 year ago

About the memory leaks: I did check them and fixed the most important ones. I wouldn't sweat over using valgrind on the test_vm program, that one is expected to produce many errors as it checks all the degenerate situations, uninitialized variables and the like. I have added a C implementation of a minimal PLC application (the same one that i have in python, plclite) to use for this kind of testing. Maybe this covers the need for the test you asked about loading files, too. I think you can use your standalone compiler to check memory errors as well. I will create a new release tag with these fixes, thanks for triggering this!

This weekend I will try to return to the project

hiperiondev commented 1 year ago

A question: if you had an API that accepts FILE instead of file name to load the program file, say plc_load_program(FILE f, plc_t p); would that be useful for your FreeRTOS port? I was thinking to add this in order to decouple filename from file pointer but not sure if it is really meaningful. we can move this discussion to emails if you like.

It would be interesting. Perhaps we could close this issue and continue by mail or a new one.

hiperiondev commented 1 year ago

Hey! I'm missing in action... I know, sorry. After a lot of work I have finished a related project (https://github.com/hiperiondev/il_parser).

Soon I'll be back in this project.

kalamara / librelogic

Separate compiler from VM #8