the-cold-dark / genesis

The Cold server - This implements a dynamic, object-oriented language on top of an object database, well suited for virtual environments and online servers (like games).
29 stars 7 forks source link

Panic when enabling `USE_BIG_FLOATS` and `USE_BIG_NUMBERS` #15

Open lisdude opened 2 years ago

lisdude commented 2 years ago

When both options are defined, coldcc panics:

Compiling (99.9%) $conditional_group...[15 Nov 2022 10:54:21] PANIC: emalloc(FFFFFFFFFFFFFFFA) failed.
[15 Nov 2022 10:54:21] doing binary dump...[15 Nov 2022 10:54:21] Syncing lookup name cache...
Done
[15 Nov 2022 10:54:21] Current task:
[15 Nov 2022 10:54:21] Stack trace for task 1:
[15 Nov 2022 10:54:21] ---
[15 Nov 2022 10:54:21] Suspended tasks:
[15 Nov 2022 10:54:21] Paused tasks:
[15 Nov 2022 10:54:21] Creating Core Image...
fish: Job 1, './bin/coldcc' terminated by signal SIGABRT (Abort)

This is an otherwise default setup with the latest commits of Genesis and ColdCore. Same behavior in Gentoo with gcc and macOS with clang.

waywardmonkeys commented 2 years ago

Thanks for the bug report. I think this is related to an issue that I didn't file or fix yet ... where we use SIZEOF_FLOAT instead o the correct value based on USE_BIG_FLOATS ... but it might be something else as well!

waywardmonkeys commented 2 years ago

The issue I just described is now filed as #16. It is not the cause of this issue, so I'll need to debug it further. (I'm about to land the fix for #16 now and will look at this one more after I sleep.)

waywardmonkeys commented 2 years ago

I looked into this more instead of going to bed.

I've got a local fix ... one of my changes to this code before was wrong and I need to revert it. But since I already made an error in how I was looking at this code, I want to look at it further to make sure that I'm not getting more wrong.

So I'll still push a fix tomorrow, but at least now I know what it is.

The wrong commit was fab9893b56e24ef93671c87bb6148bf14c152d9c.

waywardmonkeys commented 2 years ago

Okay, so this happens because write_long, read_long, and size_long choke on values at some point larger than 32 bits.

When using both USE_BIG_FLOATS and USE_BIG_NUMBERS, say you have code in a method that uses a value 1.0. This will get emitted into the method's opcodes as ... FLOAT, 0x3ff0000000000000, .... Without the USE_BIG_NUMBERS flag, opcodes are int32_t, so we break that float into 2 32 bit values and store it. But in commit referenced above, I said "Hey, since the opcodes are 64 bit, let's just store it immediately."

Everything is fine there until we get to the integer compression code when writing out the opcodes or reading it back in. Since that value is bigger than we'd emitted before, it blows up.

This is probably also true if you have a sufficiently large integer with just USE_BIG_NUMBERS since that would trigger the same bug in the integer compression code.

It'll take me longer to figure out what's going wrong in the integer compression code. (Or just replace it with something more modern like protobuf's zigzag encode integers...)

waywardmonkeys commented 2 years ago

I haven't looked at that code in 20 years since @braddr and I were working on things back then...

waywardmonkeys commented 1 year ago

This is because num_bytes gets stored as a 3 bit value in the leading byte of the value ... and when in 64 bit mode, we can need more bytes than we can encode there, and so the code goes wrong.

I'll fix this by switching the encoding mechanism ... I've started implementing ULEB128 encoding with ZigZag encoding to handle signed values.