willox / auxtools

Rust library for low-level interfacing with BYOND's virtual machine. Includes a remote debugger for the BYOND DreamMaker language.
MIT License
27 stars 32 forks source link

extremely high suspended procs force proc array reallocation #87

Open checkraisefold opened 2 months ago

checkraisefold commented 2 months ago

platform: windows byond ver: 515.1633 hello, I am experiencing periodic crashes while debugging (both breakpoints/runtime errors) i decided to spend some time debugging; the crash is apparently caused by get_line_number.

more specifically, the get_misc_by_id call caused by proc.bytecode(). get_misc_by_id is called with an ID of 0x007F7A5C by auxtools; after checking, this byondcore function checks it against what seems to be a maximum, which at that time was 0x0001665F; an infinitely more reasonable value.

i can't consistently repro, but I was just debugging runtime errors on that specific byond version for https://github.com/Foundation-19/Foundation-19 when I experienced this issue

crash point; https://github.com/willox/auxtools/blob/master/debug_server/src/server.rs#L210 caused by; https://github.com/willox/auxtools/blob/master/auxtools/src/raw_types/misc.rs#L152

checkraisefold commented 2 months ago

Seems to be a wider issue with random invalid proc references, only on some procs.

checkraisefold commented 2 months ago

poking @Absolucy for this since primary maintainer of 515 support

checkraisefold commented 2 months ago

Ah. After many.. many a time spent debugging, I have finally found what's happening. When populate_procs is called, the proc has a valid entry. Perfectly valid!

At some point, by the time the breakpoint is hit, the entry pointer is now completely invalid. Incredibly exciting. After adding some absolutely cursed debug code (photos attached), it turns out that the pointer to proc entry is magically changed by the game some time after the initial proc population. This debug code was able to fix the crash. image

Breakpointed this to confirm; proc entry did indeed change. devenv_Y9tXeRs0ch

checkraisefold commented 2 months ago

It's being realloc'd by the function at byondcore.dll+0x207E50 on 515.1633 byondcore.dll, hope that helps a bit. This actually happens multiple times during server initialization; I have no clue what's prompting it either. That function is being called by byondcore.dll+0x1F0A80.

checkraisefold commented 2 months ago

Got a working (albeit poorly made) patch to proc.rs that fixes the issue. Other possible solution is hooking the proc that BYOND reallocates the proc entry array on and only checking for reallocation then, but that requires maintaining another signature.

Patch ``` diff --git a/auxtools/src/proc.rs b/auxtools/src/proc.rs index 29864d4..b243d76 100644 --- a/auxtools/src/proc.rs +++ b/auxtools/src/proc.rs @@ -4,6 +4,8 @@ use std::{ fmt }; +use lazy_static::lazy_static; + use ahash::RandomState; use fxhash::FxHashMap; @@ -166,14 +168,25 @@ impl fmt::Debug for Proc { } } +struct ArrayStartWrapper { + pub ptr: *mut raw_types::procs::ProcEntry +} + thread_local!(static PROCS_BY_NAME: RefCell, RandomState>> = RefCell::new(HashMap::with_hasher(RandomState::default()))); thread_local!(static PROC_OVERRIDE_IDS: RefCell> = RefCell::new(FxHashMap::default())); +thread_local!(static PROC_ENTRY_ARRAY_START: RefCell = RefCell::new(ArrayStartWrapper {ptr: std::ptr::null_mut()})); fn strip_path(p: String) -> String { p.replace("/proc/", "/").replace("/verb/", "/") } pub fn populate_procs() { + unsafe { + PROC_ENTRY_ARRAY_START.with_borrow_mut(|a| { + assert_eq!(raw_types::funcs::get_proc_array_entry(&mut a.ptr, raw_types::procs::ProcId(0)), 1); + }) + }; + let mut i: u32 = 0; loop { let proc = Proc::from_id(raw_types::procs::ProcId(i)); @@ -210,6 +223,20 @@ pub fn clear_procs() { } pub fn get_proc_override>(path: S, override_id: u32) -> Option { + let mut start_has_changed = false; + PROC_ENTRY_ARRAY_START.with_borrow(|a| { + let mut new_start: *mut raw_types::procs::ProcEntry = std::ptr::null_mut(); + let current_start = a.ptr; + unsafe { + assert_eq!(raw_types::funcs::get_proc_array_entry(&mut new_start, raw_types::procs::ProcId(0)), 1); + start_has_changed = new_start != current_start; + } + }); + if start_has_changed { + clear_procs(); + populate_procs(); + } + let s = strip_path(path.into()); PROCS_BY_NAME.with(|h| match h.borrow().get(&s)?.get(override_id as usize) { Some(p) => Some(p.clone()), ```

Edited this to fix obvious nested borrowing mistake causing it to panic

willox commented 2 months ago

do we know why a reallocation is happening?

willox commented 2 months ago

it seems that the best solution would be to store the index in to the array inside our Proc type instead of the pointer

checkraisefold commented 2 months ago

it seems that the best solution would be to store the index in to the array inside our Proc type instead of the pointer

good solution, except you have the overhead of grabbing with get_proc_array_entry every time. it might be faster to actually store the proc array and do pointer arithmetic to get the proc entry instead of calling into it in that case

do we know why a reallocation is happening?

no idea, random byond code that I didn't look into a whole lott

checkraisefold commented 2 months ago

okay I finally figured it out i think

the proc I was crashing on was a breakpoint in /Initialize() for some wall subtype. right around initialization of SSatoms was when the reallocations were happening. when I looked at the call stack after applying the patch, I was able to see the Holy Grail of light power_change. image

after looking at the code, the reasoning was very evident (this is called on every light in every area on /area/initialize) VSCodium_zc3TNddgqZ

changing the light code to not use a spawn() and changing back to the original spacemandmm instead of my custom compiled one (so it actually grabs the auxtools bundle) proves this theory, as the crash/reallocation no longer happens. i guess its something to do with too many suspended procs forcing a reallocation; this is a problem primarily on older codebases tbh

ZeWaka commented 2 months ago

goon probably has this issue due to the massive amount of spawns we use