Closed vintagepc closed 3 years ago
That crash is interesting. It tries to execute the detour trampoline on the heap, but that part of the heap isn't executable. It should be though, so maybe it's some flaw with setting the memory permission flags on Centos.
The first thing to try is using the size
parameter, but 0x0bebec40
isn't right at a page boundary, so this bug doesn't seem relevant to your crash. The memory maps in the breakpad dump of the last crash ID you posted in that other thread show that this address is right inside the rw- heap. Not sure why it's not executable.
SourceHook does https://github.com/alliedmodders/metamod-source/blob/80d3f9c14d5a78c3e93b9822e983eace41981d4f/core/sourcehook/sh_memory.h#L195
Interesting. Here's a few more dumps with different IDs and different fault addresses while I was experimenting with enabling/disabling different combos of detours. I'm guessing it's the same underlying issue?
VHIS-MTG5-OIN2 JNID-X2YW-2BWB
The extra weird thing is the systems that work fine have not (to my knowledge) been configured differently from those that do not. Just applying the provider's Centos image (same provider for all hosts), install required libraries for SRCDS, and then deploy the server instance.
Do you have ssh access to the Centos instance? What does getconf PAGESIZE
on the commandline give you? I'd expect 4096.
Yes, I have full control.
addons]$ getconf PAGESIZE
4096
(both for working and non-working instances)
I've added a fail check to the mprotect call. Can you try this build and monitor the server console during boot? It's compiled against SM1.10.
Sure thing, I will do that after I finish work.
After looking at the logic again, there is no good reason to use the heap at all in this code. It's left over from the original DynamicHooks library, but since we use SourcePawn's allocator for all other code chunks already, we can use it for the trampoline as well.
This should avoid your issue all together. Please have a try with this version, it contains the previous mprotect+perror check as well. dhooks.ext.zip
It might still be interesting to see if the first build prints any errors of mprotect failing on heap addresses, so would be great if you could test both builds even though I'm pretty sure the last one fixes the issue.
:bell: :bell: :bell: WE HAVE A WINNER
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
mprotect: Permission denied
followed by the expected crash on player connect.
Second plugin - Looks good. I am running a custom version of the offending plugin with extra logging in its detours. For any given detour pair , e.g:
DHookEnableDetour(hook, false, CTFPlayerShared_RemoveCond);
DHookEnableDetour(hook, true, CTFPlayerShared_RemoveCondPost);
I would previously see the primary detour get called. Post detours were never getting called and I would see one, maybe two before the crash occurred.
With the fixed version, the Post detour is firing correctly and I had a few pages of log lines entering/exiting the function go by without any issues.
I'll leave it running with some bots for a bit to see if anything else crops up.
Thanks for getting this turned around so quickly!
Great! 🎉
Do you use SELinux by any chance? Check the output of getsebool allow_execheap
which might be your offender. You should leave it disallowed, but this might be a good thing to check for in your audit logs for other crashes you experience in the future.
selinuxuser_execheap --> off
for the problematic instances.
We no longer have the "working" VPS to verify if it was indeed allowed or disabled there though.
That's probably it. Very strange that by default it would be enabled on one VPS deploy and not another from the same provider though. Ah well shrug
Not sure if this is related to #11 but I too am seeing segfaults both on Centos 7 and 8 relating to Detours. Some systems will do it consistently, others (with the same set of srcds server files copied via rsync -a) will have no issues whatsoever.
This is on TF2Classic (SDK2013) with the TF2C.smx plugin. (Additional discussion here: https://github.com/Scags/TF2Classic-Tools/issues/6 )
Disabling all DHookEnableDetour calls in the plugin stops the crashes from happening, and changing which ones I leave enabled or comment out changes the crash address consistently.
There's a linked crash ID there and I can provide more if they would be helpful.
Any thoughts? Observed with SM1.10 and 11, 2.2.0 detours 14a thru 16.