jingoro2112 / wrench

practical embedded script interpreter
MIT License
106 stars 9 forks source link

Dynamic code loading + other questions #21

Closed ohordiichuk closed 4 months ago

ohordiichuk commented 4 months ago

Hello,

first of all thank you for the library development! It's really amazing, easier to setup compared to Lua and very optimized in RAM\ROM. Basically all our requirements are satisfied, and we are embedding wrench into commercial product, which is running on Cortex M4 :)

My main question is it possible to dynamically execute bytecode on top of existing WRContext? Something similar what eval() in Javascript and dostring() in Lua do. So the following would work:

const char *code1 = "var test = 1;";
const char *code2 = "print(test);";
unsigned char *byteCode1, *byteCode2;
int size1, size2;

wr_compile(code1, strlen(code1), byteCode1, size1);
wr_compile(code2, strlen(code2), byteCode2, size2);

WRContext* context = wr_run(w, byteCode1, size1);

// will be compilation error, because variable test is not in the "scope":
wr_run(w, byteCode2, size2); 

// instead it would be great if something below would be possible
pseudo_code_continue_run(context, byteCode2, size2); // prints "1" 

Our use case requires downloading "dynamic" script (bytecode) from server and executing it on top of "default" script. And that "dynamic" script should be able to access functions and variables scope of the "default" script.

Apart from this question, I have few other less major questions\points:

  1. Do you plan to add "custom allocator" support? We are using custom allocator, which allows us to strictly control memory consumption and it's really not so easy to replace standard allocator for new\delete.
  2. Should your code always embraced with try\catch? Enabling exceptions in C++ should add some overhead to ROM I believe. But on the other hand, how can we know that there are no available memory on heap and and handle it properly instead of "crashing" the app?
  3. How safe is calling wr_run? Can it overwrite memory? Stack overflow protection (I assume yes, because of stack size, but double checking)? What will happen if corrupted bytecode executed?
  4. In my system (Zephyr OS, GLIB) I can completely disable LibCPP. In this case there will be compilation errors in wrench because of malloc\free (to fix it I replaced all C++ headers (cstdlib) to their C analogues) and problems with static constructors (I simply moved them to file scope). It saved me few additional KB of ROM during the tests (but the tests were not under -Os). So if you're looking for optimizations, it might be a good place :)
jingoro2112 commented 4 months ago

Very flattered you are considering wrench!

My main question is it possible to dynamically execute bytecode on top of

existing WRContext? Something similar what eval() in Javascript and dostring() in Lua do. So the following would work:

well.. no. one of the keys to wrench's speed and small size is statically linking all the symbols. "text" turns into "offset 0x1" so fetching it's value is just context[0x1] which is lightning fast and super small.

Having said that, it would sure be useful to load new code into an existing environment, and I have some brainstorms on how to do that. Wrench also dynamically links so it can import run-time functionality, theoretically it could have an option to keep everything dynamic. it would be a little slower and a little larger (only a little) but get the job done.

Also I have added debug functionality, which embeds all the data necessary to reconstruct the static linkage. It might be possible to use that, let me think on it this might be something I can add.

  1. Do you plan to add "custom allocator" support? We are using custom allocator, which allows us to strictly control memory consumption and it's really not so easy to replace standard allocator for new\delete.

hadn't planned on it but it's easy. The compiler uses full c++ stack but the virtual machine (WRENCH_WITHOUT_COMPILER) does not use new/delete at all. pretty sure I removed all cases of it, using only malloc/free. There is no way I could do a custom allocator for the compiler but the VM? easy peezy. I'll throw that together.

  1. Should your code always embraced with try\catch? Enabling exceptions in C++ should add some overhead to ROM I believe. But on the other hand, how can we know that there are no available memory on heap and and handle it properly instead of "crashing" the app?

exceptions are .. well.. I hate them. not because they are bad, but because they solve a few problems really well while getting overused where they should not be.

The example of throwing an exception on "out of memory" is a good example of where exceptions shine, if I add a custom allocator then you could do them yourself, yes? I'll see what I can whip up. I come from a server/high-availability world where stopping because "hey this string does not parse to an int!" is NOT okay. Log and continue.

  1. How safe is calling wr_run? Can it overwrite memory? Stack overflow protection (I assume yes, because of stack size, but double checking)? What will happen if corrupted bytecode executed?

as safe as any c++ function, which is to say "not very". executing bad bytecode is catastrophic, offsets are trusted for speed. That's why bytecode is CRC checked before wr_run will touch it:

WRContext wr_newContext( WRState w, const unsigned char* block, const int blockSize ) { // CRC the code block, at least is it what the compiler intended? uint32_t hash = READ_32_FROM_PC( block + (blockSize - 4) ); if ( hash != wr_hash_read8(block, (blockSize - 4)) ) { w->err = WR_ERR_bad_bytecode_CRC; return 0; }

. . .

You'd have to feed it very well-formed garbage to kill it, otherwise it will kick out any bytecode before even looking at it.

  1. In my system (Zephyr OS, GLIB) I can completely disable LibCPP. In this case there will be compilation errors in wrench because of malloc\free (to fix it I replaced all C++ headers (cstdlib) to their C analogues) and problems with static constructors (I simply moved them to file scope). It saved me few additional KB of ROM during the tests (but the tests were not under -Os). So if you're looking for optimizations, it might be a good place :)

Not sure I understand this part.

If the dynamic code loading into an existing environment is a show-stopper then its not likely wrench is suited to your task, I'll try some things and see if I can code around it, give me a few days to bang out some ideas.

-Curt

Message ID: @.***>

ohordiichuk commented 4 months ago

Thank you for the fast response and detailed answers!

Also I have added debug functionality, which embeds all the data necessary to reconstruct the static linkage. It might be possible to use that, let me think on it this might be something I can add.

Thank you, please let me know!

the virtual machine (WRENCH_WITHOUT_COMPILER) does not use new/delete at all. pretty sure I removed all cases of it, using only malloc/free. There is no way I could do a custom allocator for the compiler but the VM? easy peezy. I'll throw that together.

Highly appreciated!

The example of throwing an exception on "out of memory" is a good example of where exceptions shine, if I add a custom allocator then you could do them yourself, yes? I'll see what I can whip up. I come from a server/high-availability world where stopping because "hey this string does not parse to an int!" is NOT okay. Log and continue.

I agree that exceptions is something not suitable. They have big overhead and not really suitable for embedded world. Well if that would be a C library, then my suggestion would be very straightforward: every function that can produce errors - must return error as a return value, so the user can handle it from the code. Same as in any C-library. It includes memory allocation errors as well, but not only. For example it can also be bad parameters error in function call. But this is C++ library and designed differently, so this approach will not work. Instead I can suggest the following:

1) For functions that return pointers - return NULL in case of error 2) For functions that return void - change return type to bool and return "false" in case of error 3) For debugging or readable error strings implement const char* WRState::GetLastError(). Can be disabled by default for ROM savings.

For example similar approach is being used in "nanopb" library.

On the other hand all error checks in the code have their "price" in ROM and potential execution overhead. That's why Lua library has error checking optional - disabled by default for "bad parameters" checks. But in different situations there could be different motivations:

1) Execution speed & minimum ROM usage at all possible cost - in this case all error checking should be disabled. 2) Necessity to debug a non-working code - in this case "bad params" errors could be enabled for users convenience 3) Minimum overhead error handling in order to ensure safety of your code - bad memory allocations (every alloc\re-alloc is checked for NULL) + VM critical errors <-- this is would our preferred usage use case

These are just my thoughts & suggestions, I hope it's something useful.

bytecode is catastrophic, offsets are trusted for speed. That's why bytecode is CRC checked before wr_run will touch it:

Thanks. In this case we will run the code in memory protected unit I guess. But it's very good it has crc32 protection.

Not sure I understand this part.

In our environment (Zephyr) we can have minimum C++ support: new\delete, classes + static global initializers. And C++ standard library is optional for linking (in our case it's GNU C++ Lib). Please see this link for detailed explanation: https://docs.zephyrproject.org/latest/develop/languages/cpp/index.html

So if we disable standard library, then wrench VM will not compile and link. I made the following changes in this code:

in wrench.h (it fixes "include not found error"):

#if !defined(ARDUINO) && \
    (__arm__ || WIN32 || _WIN32 || __linux__ || __MINGW32__ || __APPLE__ || __MINGW64__ || __clang__ || __GNUC__)
#include <fcntl.h>
#include <memory.h>
#include <stdio.h>
#include <stdlib.h> // added stdlib instead
#include <sys/stat.h>
#include <sys/types.h>
// #include <cstdlib>
// #include <cstring>
#endif

in wrench.cpp (made static variables in global scope to avoid linkage errors):

//------------------------------------------------------------------------------
static WRValue temp1;

WRValue& WRValue::singleValue() const {
    if ((temp1 = deref()).type > WR_FLOAT) {
        temp1.ui = temp1.getHash();
        temp1.p2 = INIT_AS_INT;
    }

    return temp1;
}

//------------------------------------------------------------------------------
static WRValue temp2;

WRValue& WRValue::deref() const {
    if (type == WR_REF) {
        return r->deref();
    }

    if (!IS_ARRAY_MEMBER(xtype)) {
        return const_cast<WRValue&>(*this);
    }

    temp2.p2 = INIT_AS_INT;
    unsigned int s = DECODE_ARRAY_ELEMENT_FROM_P2(p2);

    if (IS_RAW_ARRAY(r->xtype)) {
        temp2.ui = (s < (uint32_t) (EX_RAW_ARRAY_SIZE_FROM_P2(r->p2))) ? (uint32_t) (unsigned char) (r->c[s]) : 0;
    } else if (s < r->va->m_size) {
        if (r->va->m_type == SV_VALUE) {
            return r->va->m_Vdata[s];
        } else {
            temp2.ui = (uint32_t) (unsigned char) r->va->m_Cdata[s];
        }
    }

    return temp2;
}

These 2 minor adjustments (and excluding standard C++ library) saved me 8172 bytes of ROM with -Os in my system, which I believe is a lot)

If the dynamic code loading into an existing environment is a show-stopper then its not likely wrench is suited to your task, I'll try some things and see if I can code around it, give me a few days to bang out some ideas.

It's not show stopper. We found a way how to avoid using this by saving\loading state with functions and running dynamic part as a separate script. So we have a workaround. But since this feature is available in different languages and really useful, I decide to ask may be it's also built-in or easy to add.

jingoro2112 commented 4 months ago

Okay added the custom allocator. I was able to remove all new/delete from the entire codebase using emplacement new, so you now have:

// by default wrench uses malloc/free but if you want to use your own // allocator it can be set up here // NOTE: this becomes global for all wrench code! typedef void (WR_ALLOC)(size_t size); typedef void (WR_FREE)(void ptr); void wr_setGlobalAllocator( WR_ALLOC wralloc, WR_FREE wrfree );

under the hood it's dead simple:

WR_ALLOC g_malloc = &malloc; WR_FREE g_free = &free; //------------------------------------------------------------------------------ void wr_setGlobalAllocator( WR_ALLOC wralloc, WR_FREE wrfree ) { g_malloc = wralloc; g_free = wrfree; }

Then I removed all instances of new/delete and replaced them with malloc/free and an emplacement new where necessary. was fun :P

I also included your idea of removing the C++ standard library, didn't realize how little of it I used.

Working on the dynamic code adding to the environment.. I feel like I'm a good night's sleep and a couple of showers away from a solution.

-Curt

On Thu, May 23, 2024 at 1:36 PM ohordiichuk @.***> wrote:

Thank you for the fast response and detailed answers!

Also I have added debug functionality, which embeds all the data necessary to reconstruct the static linkage. It might be possible to use that, let me think on it this might be something I can add.

Thank you, please let me know!

the virtual machine (WRENCH_WITHOUT_COMPILER) does not use new/delete at all. pretty sure I removed all cases of it, using only malloc/free. There is no way I could do a custom allocator for the compiler but the VM? easy peezy. I'll throw that together.

Highly appreciated!

The example of throwing an exception on "out of memory" is a good example of where exceptions shine, if I add a custom allocator then you could do them yourself, yes? I'll see what I can whip up. I come from a server/high-availability world where stopping because "hey this string does not parse to an int!" is NOT okay. Log and continue.

I agree that exceptions is something not suitable. They have big overhead and not really suitable for embedded world. Well if that would be a C library, then my suggestion would be very straightforward: every function that can produce errors - must return error as a return value, so the user can handle it from the code. Same as in any C-library. It includes memory allocation errors as well, but not only. For example it can also be bad parameters error in function call. But this is C++ library and designed differently, so this approach will not work. Instead I can suggest the following:

  1. For functions that return pointers - return NULL in case of error
  2. For functions that return void - change return type to bool and return "false" in case of error
  3. For debugging or readable error strings implement const char* WRState::GetLastError(). Can be disabled by default for ROM savings.

For example similar approach is being used in "nanopb" library.

On the other hand all error checks in the code have their "price" in ROM and potential execution overhead. That's why Lua library has error checking optional - disabled by default for "bad parameters" checks. But in different situations there could be different motivations:

  1. Execution speed & minimum ROM usage at all possible cost - in this case all error checking should be disabled.
  2. Necessity to debug a non-working code - in this case "bad params" errors could be enabled for users convenience
  3. Minimum overhead error handling in order to ensure safety of your code - bad memory allocations (every alloc\re-alloc is checked for NULL) + VM critical errors <-- this is would our preferred usage use case

These are just my thoughts & suggestions, I hope it's something useful.

bytecode is catastrophic, offsets are trusted for speed. That's why bytecode is CRC checked before wr_run will touch it:

Thanks. In this case we will run the code in memory protected unit I guess. But it's very good it has crc32 protection.

Not sure I understand this part.

In our environment (Zephyr) we can have minimum C++ support: new\delete, classes + static global initializers. And C++ standard library is optional for linking (in our case it's GNU C++ Lib). Please see this link for detailed explanation: https://docs.zephyrproject.org/latest/develop/languages/cpp/index.html

So if we disable standard library, then wrench VM will not compile and link. I made the following changes in this code:

in wrench.h (it fixes "include not found error"):

if !defined(ARDUINO) && \

(__arm__ || WIN32 || _WIN32 || __linux__ || __MINGW32__ || __APPLE__ || __MINGW64__ || __clang__ || __GNUC__)

include

include

include

include // added stdlib instead

include <sys/stat.h>

include <sys/types.h>// #include // #include

endif

in wrench.cpp (made static variables in global scope to avoid linkage errors):

//------------------------------------------------------------------------------static WRValue temp1;

WRValue& WRValue::singleValue() const { if ((temp1 = deref()).type > WR_FLOAT) { temp1.ui = temp1.getHash(); temp1.p2 = INIT_AS_INT; }

return temp1;

} //------------------------------------------------------------------------------static WRValue temp2;

WRValue& WRValue::deref() const { if (type == WR_REF) { return r->deref(); }

if (!IS_ARRAY_MEMBER(xtype)) {
    return const_cast<WRValue&>(*this);
}

temp2.p2 = INIT_AS_INT;
unsigned int s = DECODE_ARRAY_ELEMENT_FROM_P2(p2);

if (IS_RAW_ARRAY(r->xtype)) {
    temp2.ui = (s < (uint32_t) (EX_RAW_ARRAY_SIZE_FROM_P2(r->p2))) ? (uint32_t) (unsigned char) (r->c[s]) : 0;
} else if (s < r->va->m_size) {
    if (r->va->m_type == SV_VALUE) {
        return r->va->m_Vdata[s];
    } else {
        temp2.ui = (uint32_t) (unsigned char) r->va->m_Cdata[s];
    }
}

return temp2;

}

These 2 minor adjustments (and excluding standard C++ library) saved me 8172 bytes of ROM with -Os in my system, which I believe is a lot)

If the dynamic code loading into an existing environment is a show-stopper then its not likely wrench is suited to your task, I'll try some things and see if I can code around it, give me a few days to bang out some ideas.

It's not show stopper. We found a way how to avoid using this by saving\loading state with functions and running dynamic part as a separate script. So we have a workaround. But since this feature is available in different languages and really useful, I decide to ask may be it's also built-in or easy to add.

— Reply to this email directly, view it on GitHub https://github.com/jingoro2112/wrench/issues/21#issuecomment-2127709079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIKA5GKB244OFDYLV23WLZDYSILAVCNFSM6AAAAABIEEQD62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRXG4YDSMBXHE . You are receiving this because you commented.Message ID: @.***>

ohordiichuk commented 4 months ago

Thank you for the update! Great ROM size reduction in 4.0.1:

Знімок екрана 2024-05-24 о 13 24 20

By the way regarding dynamic code execution, it will allow people to implement modules with wrench, which is an awesome feature to have. E.g.:

import("mylib");

mylib::myfunc(1, 2, 3);
jingoro2112 commented 4 months ago

The work requested by this issue has been completed in 4.0.1 :)

ohordiichuk commented 4 months ago

@jingoro2112 Can you please let me know if the dynamic code feature is something you plan to do? Or it's too complicated and doesn't make sense?

jingoro2112 commented 4 months ago

Yes I do. I have an implementation I am debugging right now and it will be part of the next update which will be out as soon as it works :)

The bad news is global variables will not be shared across import (yet) just functions, a workaround would be to create an accessor which would work, ie:

------------------ base.w var a = someFunc(2); // won't work

lib::import( "has_someFunc.w" ); // how the above WILL work someVar = 22; // compilation error, someVar (from the above import) is not in this namespace

var b = getSomeVar(); // 'b' is now 50

----------------- has_someFunc.w

var someVar = 50 function someFunc( var b ) { return b + 3; }

functions getSomeVar() { return ::someVar; }


When? I have the base implementation done and figured out just have to debug and test it, week at the most, more likely 2 or 3 days.

-Curt

On Fri, May 31, 2024 at 8:37 AM ohordiichuk @.***> wrote:

@jingoro2112 https://github.com/jingoro2112 Can you please let me know if the dynamic code feature is something you plan to do? Or it's too complicated and doesn't make sense?

— Reply to this email directly, view it on GitHub https://github.com/jingoro2112/wrench/issues/21#issuecomment-2142010162, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIKA2QWTQZQFR4VEPORILZFBVJHAVCNFSM6AAAAABIEEQD62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGAYTAMJWGI . You are receiving this because you were mentioned.Message ID: @.***>

ohordiichuk commented 4 months ago

Great news and thank you! The global variables limitation is ok I guess. Moreover in languages such as Javascript (node.js) scope of the file is accessible only to the file.

But what about enums & structs - could they be imported as well? I mean their declaration:

// wrench1.w

enum Test {
   A, B, C
}

// wrench2.w

print(A);
jingoro2112 commented 4 months ago

no for enums, they are just syntactic sugar for variable declaration

structs? they are units same as functions so yes.. I THINK .. depends on how you use them, but in principal yes since they can be found the same way the imported functions are found, give me a basic sample case and I'll make sure it can work.

On Fri, May 31, 2024 at 10:50 AM ohordiichuk @.***> wrote:

Great news and thank you! The global variables limitation is ok I guess. Moreover in languages such as Javascript (node.js) scope of the file is accessible only to the file.

But what about enums & structs - could they be imported as well?

— Reply to this email directly, view it on GitHub https://github.com/jingoro2112/wrench/issues/21#issuecomment-2142434888, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIKA2CALHID5KOSXTO5JTZFCE4VAVCNFSM6AAAAABIEEQD62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGQZTIOBYHA . You are receiving this because you were mentioned.Message ID: @.***>

ohordiichuk commented 4 months ago

Got it about enums, thanks!

An example of structs usage:

wrench1.w

var defaultColor = 123;

struct TextAttributes {
    var x;
    var y;
    var color;
    var bgСolor;
    var fontIndex;
    var fontSize;

    x = 0;
    y = 0;
    color = ::defaultColor;
}

wrench2.w

import("wrench1.w");
var textAttr = new TextAttributes()

print(textAttr.x);
jingoro2112 commented 4 months ago

yeah 99% sure this will work. give me a day or so to work out the wrinkles, I am tracking down an issue in yield() right now.

On Fri, May 31, 2024 at 11:14 AM ohordiichuk @.***> wrote:

Got it about enums, thanks!

An example of structs usage:

wrench1.w

var defaultColor = 123;

struct TextAttributes { var x; var y; var color; var bgСolor; var fontIndex; var fontSize;

x = 0;
y = 0;
color = ::defaultColor;

}

wrench2.w

var textAttr = new TextAttributes()

print(textAttr.x);

— Reply to this email directly, view it on GitHub https://github.com/jingoro2112/wrench/issues/21#issuecomment-2142479323, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALIKAYHCBIGEGMCQBVHBF3ZFCHUJAVCNFSM6AAAAABIEEQD62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGQ3TSMZSGM . You are receiving this because you were mentioned.Message ID: @.***>