siesta-project / aiida_siesta_plugin

Source code for the AiiDA-Siesta package (plugin and workflows). See wiki
Other
6 stars 11 forks source link

FDFDict always return True when checking if a string belongs to it #47

Closed bosonie closed 4 years ago

bosonie commented 4 years ago

The following case:

from aiida_siesta.calculations.tkdict import FDFDict
di = {"First":"E","Second":"S"}
fdfdi =  FDFDict(di)

Now:

"whatever" in di

Return False

"whatever" in fdfdi.keys()

Return False

 "whatever" in fdfdi.values()

Return False. But:

  "whatever" in fdfdi

Return always True, no matter the value of whatever. Can @vdikan help me to understand why? Even if we don't understand the reason, we should implement at least an error, just to avoid to return a True.

vdikan commented 4 years ago

Short answer: you can fix this behavior by redefining __contains__ method for TKDict, that apparently is responsible in python for the in operator. These lines will do:

    def __contains__(self, key):
#        return ( self[key] or False ) - this would be nice, but fails for "False" member test, need to check for None value exactly:
        if (self[key] is not None):
            return True
        else:
            return False

(put among other methods in aiida_siesta.calculations.tkdict.TKDict)

Then it behaves like:

>>> di = {"First":"E","Second":"S","third":False}
>>> fdfdi =  FDFDict(di)
>>> "whatever" in di
False
>>> "what" in fdfdi
False
>>> "ever" in fdfdi
False
>>> # but
... 
>>> "first" in fdfdi
True
>>> "FiRsT" in fdfdi
True
>>>
>>> fdfdi["third"] 
False
>>> "ThIrD" in fdfdi
True
vdikan commented 4 years ago

Long answer: this has to do with how things are defined in parents of TKDict.

We subclass TKDict from MutableMapping in collections.abc.

help(MutableMapping) gives a bunch of info, among it mentioning inherited methods:

 |  ----------------------------------------------------------------------
 |  Methods inherited from _collections_abc.Mapping:
 |  
 |  __contains__(self, key)
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  
 |  __getitem__(self, key)

So it's actually Mapping that we hunt for. Let's see it's definition with inspect.getsource. But before we can do it, we need to realize that 'collections.abc' that is the value of Mapping.__module__ is fake, the real endpoint is '_collections_abc'. Python, I love you! Luckily one can monkey-patch all the class internals in python; the trick discussed here will let us proceed:

>>> from collections.abc import Mapping
>>> Mapping.__module__
'collections.abc'
>>> Mapping.__module__ = '_collections_abc'
>>> from inspect import getsource
>>> print(getsource(Mapping))
class Mapping(Collection):

    __slots__ = ()

    """A Mapping is a generic container for associating key/value
    pairs.

    This class provides concrete generic implementations of all
    methods except for __getitem__, __iter__, and __len__.

    """

    @abstractmethod
    def __getitem__(self, key):
        raise KeyError

    def get(self, key, default=None):
        'D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.'
        try:
            return self[key]
        except KeyError:
            return default

    def __contains__(self, key):
        try:
            self[key]
        except KeyError:
            return False
        else:
            return True

    def keys(self):
        "D.keys() -> a set-like object providing a view on D's keys"
        return KeysView(self)

    def items(self):
        "D.items() -> a set-like object providing a view on D's items"
        return ItemsView(self)

    def values(self):
        "D.values() -> an object providing a view on D's values"
        return ValuesView(self)

    def __eq__(self, other):
        if not isinstance(other, Mapping):
            return NotImplemented
        return dict(self.items()) == dict(other.items())

    __reversed__ = None

And here it is. __contains__ defined here returns False only if self[key] throws a KeyError exception. In all other cases it returns True - and that is what you see. Because of the current definition of __getitem__ in TKDict:

    def __getitem__(self, key):
        """ Translate the key, unpack value-tuple and return the value if exists or None. """
        trans_key = self.translate_key(key)
        try:
            value, last_key = self._storage[trans_key]  # pylint: disable=unused-variable
            #self._storage.__setitem__(trans_key, (value, key))
            return value
        except KeyError:
            return None

It intercepts KeyError earlier and returns None instead. Again, for inherited __contains__ that means always True.

vdikan commented 4 years ago

Mind that I updated the solution for the "False" member test.

bosonie commented 4 years ago

Perfect, super clear! Thanks. Just one more opinion I want from you. Did you have any reason to return None in __getitem__(self, key) instead of a KeyError like every python dictionary does? Just wondering if we should change that instead of redefining __contains__(self, key).

vdikan commented 4 years ago

If I remember correctly, the reason was exactly to get a None object from this custom collection and not throw an exception. Because exceptions' unwinded stack trace would break the program flow and for large projects like AiiDA this might cause global troubles.

I do not know what is better now: to 1) throw KeyError on wrong refer, check the key presence with in before refer/catch KeyError or 2) get None on wrong refer, check if what you get is not None after refer. For me they are same things (though in general python tends to try/except more than to if/else for some obscure performance (?) reasons, haha).

Choose what you like more. Personally I do not like script-breaking non-interactive error messages after just trying to get a value out of a dict.

pfebrer commented 4 years ago

Hey Vladimir, just giving my opinion here.

I think if someone is not sure that the key is in the dictionary they will do:

fdf_dict.get('my_key', None) # Where None is the fallback value

this is what you do in a normal python dict and most extensions of it. I just realized that it is a bit annoying that you can not rely on this in the FDFdict (unless your fallback is None). If you have a variable that is a dict but you are not sure if it's an FDFdict you need to either check the type or convert this:

val = dict.get('my_key', 4)

into this:

val = dict.get('my_key', None)
if val is None:
    val = 4

at each point of your code.

At the end, don't you think that returning None unexpectedly will most of the times end up in an exception further down the code that is harder to understand?

vdikan commented 4 years ago

Sure, you are right from the method dispatch point of view, here the uniform behavior of collections is likely to be assumed. On the debugging side, I don't think fallbacks make things easier. But users might want to have them yes.

pfebrer commented 4 years ago

On the debugging side, I don't think fallbacks make things easier. But users might want to have them yes.

No, I meant that it is easier to debug:

a = dict['key_that_does_not_exist'] 
# Raises KeyError 'key_that_does_not_exist' and this line is on top of the stack

than

a = dict['key_that_does_not_exist'] # Returns None

... do some more things.

a.run_method() 
# raises AttributeError: 'NoneType' object has no attribute 'run_method'
# and this line (which may be very far from the true source of the error) 
# is on top of the stack.

I agree that if you provide a fallback the problem is the same, but if you explicitly provide a fallback you probably make sure that the value can be handled, otherwise there's no point in providing a fallback :)

bosonie commented 4 years ago

Fixed by #54