ocaml / flexdll

a dlopen-like API for Windows
Other
97 stars 30 forks source link

Fix page access rights bug #57

Closed db4 closed 5 years ago

db4 commented 5 years ago

My original report in email to Alain Frisch, Nov 25 2013:

Hi Alain,

I stumbled across an access violation bug caused by the way FlexDLL
performs relocations. When constructing dynamic relocation tables it
packs several consecutive relocation addresses into one "zone". When
performing actual relocations it allows write access via
VirtualProtect(zone_begin, zone_size, PAGE_EXECUTE_WRITECOPY) (BTW,
why not PAGE_EXECUTE_READWRITE?), then patches addresses and then
restore the old access rights via VirtualProtect(zone_begin,
zone_size, old_access). Unfortunately this is not entirely correct. I
see the situation where one zone covers 2 memory pages with different
access rights: PAGE_READONLY and PAGE_EXECUTE_READ. On restore the
second page will be set PAGE_READONLY and any attempt to execute code
from it will lead to an access violation error (as CPU supports NX and
DEP is enabled in the OS).

Relocation table that causes troubles:

Dynamic relocation table found at 634B13A0
 Non-writable relocation in zone 63483D57 -> 63483E83
 Non-writable relocation in zone 63483BB8 -> 63483CD5
 Non-writable relocation in zone 63480BE5 -> 634811B8
 634811B8 (kind:0100) (now:00C363E9)  caml_local_roots
 634811B4 (kind:0100) (now:00005E58)  caml_local_roots
 63480BE9 (kind:0100) (now:00000000)  caml_atom_table
 63480BE5 (kind:0100) (now:00000000)  caml_atom_table
 63483CD5 (kind:0102) (now:013D2900)  caml_copy_string
 63483CB8 (kind:0101) (now:0032D3F8)  w32_unicode_pack_to_string
 63483CA5 (kind:0102) (now:013FC224)  caml_atom_table
 63483C8C (kind:0101) (now:9DF62F20)  caml_initialize
 63483C4E (kind:0101) (now:9DF62BFE)  caml_alloc_shr
 63483BE9 (kind:0101) (now:0032D526)  w32_uuid_ml2c
 63483BB8 (kind:0101) (now:0032D4CB)  w32_uuid_c2ml
 63483E83 (kind:0102) (now:013FB4D0)  caml_local_roots
 63483E68 (kind:0102) (now:013FB4D0)  caml_local_roots
 63483D81 (kind:0102) (now:013FB4D0)  caml_local_roots
 63483D75 (kind:0102) (now:013FB4D0)  caml_local_roots
 63483D57 (kind:0102) (now:013FB4D0)  caml_local_roots

63480000 page was PAGE_READONLY
63481000 page was PAGE_EXECUTE_READ

Another problem is that one page can be covered by several zones. As
you restore access rights in the same order you modify them, they will
be incorrect in the end. I was able to fix this just replacing

  /* Restore permissions. Should do it also in case of failure... */
  for (wr = tbl->nonwr; wr->last != 0; wr++)
    allow_write(wr->first,wr->last + 4,wr->old,&wr->old);

with

  /* Restore permissions in the reverse order. Should do it also in
case of failure... */
  wr = tbl->nonwr;
  while ((wr+1)->last != 0)
    wr++;
  for (; wr >= tbl->nonwr; wr--)
    allow_write(wr->first,wr->last + 4,wr->old,&wr->old);

but I still have no idea how to fix the first problem. Now I just use

    allow_write(wr->first,wr->last + 4,(wr->old ==
PAGE_READONLY)?PAGE_EXECUTE_READ:wr->old,&wr->old);

as a workaround in hope that someone will fix it properly.

Alain's response:

1. First version:  one changes the right accesses for each relocation and immediately restore it after the patch is done.  This even simplifies the work of flexlink.exe, since it no longer needs to compute these "zones".

2. Second version: *if* it turns out to be too expensive to change the rights on memory pages for each relocation, one can implement the simplest strategy of checking if the next relocation falls in the same page as the previous one, and in this case, avoid the two useless calls to VirtualProtect.  One should then arrange so that flexlink.exe emits relocations in a sorted order (it might already be the case). 

I think one should evaluate the performance impact of changing access rights on each relocation. There can be many relocations, and I'm concerned that this could introduce a significant slowdown when loading a DLL. Have you tried to assess the performance impact?

Now I can prove that the performance impact of the second (slow) variant is negligible. I see the following figures for test project then FLEXDLL_DEBUG is enabled:

allow_write() calls: 4, total time: 0.000029 sec

So one VIrtualProtect() call takes about 7 microseconds on my hardware. For a real DLL with even hundreds of relocations it would be several milliseconds, not more.

alainfrisch commented 5 years ago

Thanks!

On large projects, there could be several hundreds of thousands of relocations, so the overhead might not be so small. I'll try to evaluate that on my side, but if you have real-size projects to test, feel free to do some timings on your side as well!

alainfrisch commented 5 years ago

I just checked, we have more than 67000 relocations on a given .cmxs file. Assuming the 7 microsecond timing is accurate, this would mean adding about half a second load time.

alainfrisch commented 5 years ago

I've confirmed that adding two calls to allow_write for each relocations significantly increase the time to load the .cmxs file I mentioned above (from 0.20s to 0.65s). One should look for the second version.

alainfrisch commented 5 years ago

I'll give it a try!

alainfrisch commented 5 years ago

See #64. @db4 does this look good to you?