Pointer equality testing requires object support both & as well as ==

GoogleCodeExporter commented 9 years ago

The fix for pointer addresses equality testing (masking the high bits, since 
both AMD64 and IA64 ignore those bits one way or another), is raising a few 
bugs where comparisons between objects that only ever supported __eq__ are now 
having __and__ applied to them.  The two cases encountered so far are:

Pointer == Pointer

and

Pointer == CType

The CType one is because accessing a CType directly returns the offset, but it 
isn't properly proxied, so any attempts at arithmetic on a CType will fail with 
an exception.  The patch for this is r1729.

The Pointer one still requires investigate.  It's unclear what semantics we 
want for a pointer.  Generally we've always dereferenced it unless we 
explicitly request .v() of the pointer, but the discrepancy between equality 
testing (the offset) and all other arithmetic functions (the dereferenced 
object) needs thinking about...

Original issue reported on code.google.com by mike.auty@gmail.com on 25 May 2012 at 10:20

GoogleCodeExporter commented 9 years ago

This is only my opinion but I dont think we should be proxying integers as 
offsets for Pointers and CTypes. This is very confusing as its contrary to the 
usual C semantics. Please consider pointer arithmetic for example:

char *a = 0

a + 5 will point to offset 5 but

uint32_t *a=0
a + 5 will point at offset 20!

same goes for structs. If you are going to overload __add__ methods for 
pointers, then do it properly and implement the correct pointer semantics.

Equality is also not simple. For example this is undefined according to the C 
standard:

char *a=0;
uint32_t *b=0;

a==b -> Undefined!

And same goes for CTypes:
struct foo a=0;
string bar b=0;
struct baz c=0;

a==b -> Undefined;
a==c compares memory content i think.

IMHO We should stick as closely to C semantics as possible.

Original comment by scude...@gmail.com on 25 May 2012 at 11:06

GoogleCodeExporter commented 9 years ago

Yep, that sounds reasonable, so the confusing thing is that Pointers are 
NativeTypes, and actually we only dereference them in __getattr__ which 
shouldn't be applied to functions like __add__, etc.   So I need to figure out 
why (long & Pointer) didn't just work.  Maybe we haven't defined __rand__?

As to Pointer == CType, we currently don't do anything clever, the CType 
doesn't proxy anything and we end up with an exception that doesn't tell anyone 
what went wrong.  Since people are already trying to use this (for example if 
cur.Next == cur), we should look to provide an appropriate exception if we 
can...

Original comment by mike.auty@gmail.com on 26 May 2012 at 7:31

GoogleCodeExporter commented 9 years ago

Ok, looking into it further, I've tried recreating the situation John found in 
issue 148 and can't reproduce it.  Pointers are NativeTypes, which means they 
can handle __and__ and __rand__, so I can't figure out how long & Pointer could 
ever produce the exception that was reported (I wasn't able to cause the 
exception creating my own pointers and trying different operations on them).  
As such, I'm not intending to overload the __add__ methods for pointers, so no 
change to the existing semantics required there.

As for CTypes, we still need to figure out what to do about handling them.  
Equality is currently inherited from BaseObject, so either the offset (since 
that's what .v() returns for CTypes) is equal to other, or the class, offset 
and vm match.  So should we change equality, treat a ctype more as an offset 
until its attributes are accessed, or leave it as is and try to make exceptions 
easier to understand?  Given the way volatility/BaseObject's currently wired, 
I'd tend to go with treating it more fully as an offset until you access its 
members, but I'd really like to hear everybody else's thoughts on this...

Original comment by mike.auty@gmail.com on 26 May 2012 at 9:27

GoogleCodeExporter commented 9 years ago

Right, so in summary we've currently got two instances of people using cygwin 
or windows, on r1794 or r1791 (neither of which had signifcant object model 
changes made afterwards) that seem to hit upon a Pointer that for some reason 
can't handle being anded with a long (which I think means it doesn't contain a 
__rand__ function).

MHL, you seem to have direct contact with someone who can reproduce this, can 
you get details of the plugin they're running, whether you can get access to 
their image for further analysis and whether you can get them in pdb (probably 
with -d -d -d) to gather further information about the pointer comparison 
that's being carried out and what's causing it to go wrong?

Possibilities include, I suppose, something that says it's a Pointer object but 
for some reason isn't, or a Pointer that's had __rand__ overridden somehow, or 
one that didn't properly inherit from NativeType.  None of those really explain 
what's going on very appropriately...  5:\

Original comment by mike.auty@gmail.com on 29 May 2012 at 8:15

GoogleCodeExporter commented 9 years ago

So this turns out to be a very subtle/sticky issue indeed.  Here's how it goes:

0xffffffffffff & pointer

first off tries:

long.__and__(pointer)

which fails (unsurprisingly).  It then attempts the following:

pointer.__rand__(long)

pointer.__rand__ has been setup to forward to self.v().__rand__, so we end up 
with:

int.__rand__(long)

Which raises NotImplemented (since that would never get called since 
long.__and__(int) would work fine, similarly int.__and__(long) would jump to 
long.__rand__(int) which again would work fine).  This doesn't exhibit itself 
on x64 systems because 0xffffffffffff isn't a long, it's just an int.

So that gives us a few options.  We can mask the whole problem by casting 
pointer to int explicitly (in which case we do hit long.__and__(int) which 
works fine, but allows us to compare pointers to strings and find them 
equivalent).  We could also flip the ordering (so pointer.__and__(long) becomes 
int.__and__(long) which fails and calls long.__rand__(int) which succeeds).  
That only fixes it for this particular function, not in general.  Finally, we 
could ensure that proxied types are always converted to longs if they're ints, 
so that we never end up in our reverse test with something that may not 
succeed.  It's a little ugly, and I'd genuinely like opinions as to whether we 
should just take the one-off simple fix of swapping the comparators in this 
instance, or the longer term improvement of always returning longs when 
proxying a number?

The last fix is the one I've attached as a patch, since it's a little more 
involved.  Please let me know how it goes/what you think, and we can go from 
there.  Note, this probably won't change the pointer == Ctype issue, so either 
way, this won't solve this bug on its own...

Original comment by mike.auty@gmail.com on 29 May 2012 at 9:46

Attachments:

issue-265.patch

GoogleCodeExporter commented 9 years ago

I think this patch is very subtle and I can guarantee that in 6 months someone 
will look at this code and say - hey why do we enforce this to be a long() its 
the same as int() anyway (or gets promoted to it).

I think for readability sake we should have the original line as

0xfffff & int(other)

This tells the reader that we dont care what other is, we are going to treat it 
as an integer for this operation. Its basically enforcing an interface here.

The point that Ikelos makes re the promotion of a string to an int is valid  
(although IMHO not a big deal), which is why maybe the solution is:

0xfffff & other.__int__()

which solves the two issues - its clear what we are going here, and we dont 
have a miscomparison to a string.

Original comment by scude...@gmail.com on 29 May 2012 at 10:12

GoogleCodeExporter commented 9 years ago

It also only fixes this issue for this particular comparison, and I'm not sure 
that documenting to developers that every time they happen to have a numeric 
value and then an operation that they need to call a.__int__() (but not int(a)) 
because otherwise they may hit a bug which their system may never exhibit 
because their maxint is much larger than their test values.  It will be much 
easier to explain in a comment to "see issue 265" above the long casts.  
Developers may never even see it, they may just consume the output values 
without a further thought.

What I'd like to think over is whether there really isn't any further thought 
required.  I know that outputting directly will give an "L" immediately after 
the number, I need to check whether our use of "{}".format() saves us from 
seeing that.  Can anyone think of any other negative implications of all 
proxied values becoming longs?

This problem goes away in python 3 because ints no longer have bounds on them, 
so this feels like we're headed in the right direction trying to emulate that...

Original comment by mike.auty@gmail.com on 30 May 2012 at 7:00

GoogleCodeExporter commented 9 years ago

Ok, so per the 2.1 release meeting, we're going to close this out as working 
for pointer equality, and pointer == CType equality is going to remain 
undefined as to whether it works or not, at least for the forseeable future.  
Workarounds include directly verifying the pointer's value against the CType's 
obj_offset, or dereferencing the pointer directly before attempting an equality 
test...

Original comment by mike.auty@gmail.com on 6 Jun 2012 at 8:28

Changed state: Fixed

ksanchezcld / volatility

Pointer equality testing requires object support both & as well as == #265