Closed cryeprecision closed 2 weeks ago
Thank you so much!
For starters, the PR itself and the code style looks absolutely excellent, but I'm not super confident about the logic (which could actually be my own fault). Let me think it out loud.
I agree that checking protection once per page is sufficient, and of course the 512 slots that we are checking (512*8=4096 bytes) are going to either fit in exactly one page or at most straddle two pages so that's a great optimization. And we can be pretty confident that, being IDXGISwapChain3
a struct of some sorts, all of its data space will be readable.
The caveat is here. We are dereferencing the pointer to get the pointer to the ID3D12CommandQueue
. So I wonder if my original intention was to check the protection of that page where the command queue is, actually.
For background, my PR comes out of an attempt of running a heuristic that would assess whether a given pointer had a COM vtable in it to then try to construct a IUnknown::from_raw
and try to cast it to ID3D12CommandQueue
, this would have avoided adding a hook and synchronizing stuff. In that context, I would need to check the protection flags for each of those pointers.
| 0x00000000 | 0x12341234 |
| 0x00000008 | 0x43214321 |
| 0x00000010 | 0x11111111 |
...
| 0x00001000 | 0x33333333 |
In this case, all of that contiguous memory space would be in the same page and thus have a single protection flag set, but each address on the right is in a different page so that would warrant one check per address.
But with #209 we are now checking the pointer value itself i.e. 0x12341234
and are not dereferencing that in turn and don't need to check for its protection flags, so I think your approach is definitely the right way. I think we could get away with not doing the check altogether as in practice the command queue is going to be in the first few slots and I don't see microsoft people putting the pointer at the very bottom of the struct anytime soon, but the price to pay is small enough that I'd rather have the safeguard than not.
Have I gotten some of this wrong?
That being said, LGTM, and thank you for your contribution! I'll merge this in and finalize the other PR as soon as I get the chance.
For starters, the PR itself and the code style looks absolutely excellent
Thanks! :)
So I wonder if my original intention was to check the protection of that page where the command queue is, actually.
Yeah, since the struct members can be null or not a pointer at all, calling VirtualQuery
to see if it's a valid pointer makes sense.
Have I gotten some of this wrong?
What you wrote fits (and cleared some stuff up for me).
I think we could get away with not doing the check altogether
I think so too and event if the pointer is at the bottom of the struct, the whole struct should be within readable memory. This should be perfectly safe as long as they don't remove the pointer and we start reading outside the struct.
The caveat you pointed out got me thinking that I might have misunderstood the code and broke something but after looking over both the old and new snippets again I'm sure it's still doing the same thing.
I have no DirectX background (which is why I'm using this library :P), so I thought the swap_chain_ptr
was a pointer to a vtable and that you're looking through the vtable to find a pointer to the command_queue
object (which makes no sense in hindsight because a vtable only contains pointers to functions and not to other objects but I didn't think of that earlier). Now I understand that we're actually scanning struct members for a pointer.
I found this post and the offset from the answer is different from the one I get when testing (for me the command queue is at +0x168
) so doing it this way is definitely better than hardcoding an offset.
I attached a debugger to a DirectX 12 example and set a hardware breakpoint to see what reads/writes to swap_chain+0x168
(i.e., sets or reads the command queue pointer) but it didn't get hit :/
The idea was to see which functions access the struct member, open one in a disassembler, and generate a signature for the assembly that accesses the struct member with the hope that different versions of the DirectX 12 library will match the signature and only the offset will be different.
I think so too and event if the pointer is at the bottom of the struct, the whole struct should be within readable memory. This should be perfectly safe as long as they don't remove the pointer and we start reading outside the struct.
Yeah, though I don't feel like risking it -- for example, on Wine the command queue is the first qword immediately after the vtable (e.g. offset +1) so there's no saying where different implementations might end up putting that.
The caveat you pointed out got me thinking that I might have misunderstood the code and broke something but after looking over both the old and new snippets again I'm sure it's still doing the same thing.
Yep, that was definitely me mixing things up with the attempt before.
I found this post and the offset from the answer is different from the one I get when testing (for me the command queue is at
+0x168
) so doing it this way is definitely better than hardcoding an offset.
I discarded doing outright that because I got burned by it already a few years ago, when a new d3d11.dll was released and they shuffled the pointers around. I actually wanted to go the DirectComposition way myself which would've allowed me to not maintain 4 separate renderers but when I was already done I went ahead, tested it on Linux and discovered that that's not implemented by Wine, so had to roll everything back and start over. :(
I attached a debugger to a DirectX 12 example and set a hardware breakpoint to see what reads/writes to
swap_chain+0x168
(i.e., sets or reads the command queue pointer) but it didn't get hit :/ The idea was to see which functions access the struct member, open one in a disassembler, and generate a signature for the assembly that accesses the struct member with the hope that different versions of the DirectX 12 library will match the signature and only the offset will be different.
I think that would be even trickier. I expect that a change in struct layout could send the optimizer down through different paths and result in different looking assembly at least for some functions, so that's kinda wobbly to rely upon.
Memory protection flags are set per page so it's enough to check them at page boundaries. (See here)
This simplifies the code for #209 (relevant diff) to this
I've added a helper function that checks the memory protection at each page boundary in a memory region and returns a slice containing all consecutive readable elements starting from the start of the region.
I've stared at my implementation for a bit and I think the logic is correct.