pombreda / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

FeedParserDict behavior should not be controlled by `assert` #204

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Call getattr(FeedParserDict, "__deepcopy__", None)
2. It breaks in __getattr__ on the assertion.

What is the expected output? What do you see instead?

According to python doc, __getattr__ should raise AttributeError if the 
attribute does not exist. 

Use the feedparser to parse http://www.stocktwits.com/t/MSFT.rss and it 
breaks on the said problem.

Original issue reported on code.google.com by zyzhu2...@gmail.com on 20 Feb 2010 at 6:31

GoogleCodeExporter commented 9 years ago
An easy way to reproduce this is: 

feedparser.parse("http://www.stocktwits.com/t/MSFT.rss")

Original comment by zyzhu2...@gmail.com on 21 Feb 2010 at 6:55

GoogleCodeExporter commented 9 years ago
I can't reproduce this problem with the HEAD version of the sourcecode on 
Ubuntu with 
Python 2.5. Are you using version 4.1?

Original comment by adewale on 22 Feb 2010 at 1:48

GoogleCodeExporter commented 9 years ago
It occurred to me in both 4.1 and 4.2-pre-307-svn. Actually it stops on the 
assertion below in FeedParserDict.  So if you run it just in the shell it does 
not 
show up. 

class FeedParserDict(UserDict):
...
 def __getattr__(self, key):
        try:
            return self.__dict__[key]
        except KeyError:
            pass
        try:
            assert not key.startswith('_')
            return self.__getitem__(key)
        except:
            raise AttributeError, "object has no attribute '%s'" % key

Original comment by zyzhu2...@gmail.com on 24 Feb 2010 at 5:51

GoogleCodeExporter commented 9 years ago
Can you show me some example code that triggers the bug?

Original comment by adewale on 29 Apr 2010 at 4:53

GoogleCodeExporter commented 9 years ago
Please close this bug as invalid.

This is not a feedparser bug. Reporter appears to be expecting AttributeError 
to be raised for a non-existent attribute. This would normally happen, except 
that by passing None as a default return value to getattr(), the AttributeError 
is caught and the default value is returned (in this case, None).

Example:

f = feedparser.parse(url)
getattr(a, '_surprise', 'not found') # returns 'not found'
getattr(a, '_surprise')              # raises AttributeError as expected
a._surprise                          # raises AttributeError as expected

Original comment by kurtmckee on 5 Dec 2010 at 11:43

GoogleCodeExporter commented 9 years ago

Original comment by adewale on 12 Dec 2010 at 11:33

GoogleCodeExporter commented 9 years ago
This is definitely a feedparser bug. I was expecting AttributeZError to be 
raised for a non-existent attribute. I was saying the assertion in __getattr__ 
is incorrect.

Look at the following code. What happens if key="__deepcopy__" is passed to it? 
It brekas on the assertion, which it should not and go to the exception 
handler. 

The correct behavior is for __getattr__ to raise the exception and for 
getattr() to return the default value.

class FeedParserDict(UserDict):
...
 def __getattr__(self, key):
        try:
            return self.__dict__[key]
        except KeyError:
            pass
        try:
            assert not key.startswith('_')
            return self.__getitem__(key)
        except:
            raise AttributeError, "object has no attribute '%s'" % key

Original comment by zyzhu2...@gmail.com on 18 Dec 2010 at 7:16

GoogleCodeExporter commented 9 years ago
@zyzhu2000: FeedParserDict functions as expected.

import feedparser
a = feedparser.FeedParserDict()
'__deepcopy__' in a # returns False
getattr(a, '__deepcopy__') # raises AttributeError

You may not be understanding what the `return self.__dict__[key]` code is 
doing: this is here first to actually ensure that a real attribute of the dict 
(such as __len__ or __doc__) is retrieved properly. KeyErrors are caught only 
so that the dictionary's keys can be searched as well.

You may not understand why the assertion is there at all: it's there because 
FeedParserDict isn't in the business of storing keys as universally as a 
regular dict object. It's designed specifically for storing and retrieving keys 
that are predefined and used by feedparser, or exist in a special form (that 
is, $namespace_$element).

If you're using FeedParserDict in other software besides feedparser you may 
need to override the __getattr__ method in FeedParserDict in order to get the 
functionality you're wanting, such as if you're trying to store a key in the 
dictionary that's called '__deepcopy__' and then access it as if it's an 
attribute:

import feedparser
a = feedparser.FeedParserDict()
a['__deepcopy__'] = 1
a.__deepcopy__ # raises AttributeError

Code like above will *never* occur in feedparser.

Regardless, as stated above, AttributeError is raised as expected for a 
non-existent attribute, and is in fact functioning as you're claiming it 
should. You might try downloading the latest version of feedparser from svn 
trunk:

https://feedparser.googlecode.com/svn/trunk/feedparser/feedparser.py

You should also make sure that you're importing the correct feedparser file by 
running code such as:

print feedparser.__file__

as it's possible that your PYTHONPATH isn't configured correctly. I don't know 
if such an option exists, but perhaps you're running Python in an environment 
that causes Python to halt execution on failed assertions? I don't know if 
Jython or IronPython works in that manner, although I also don't know if 
feedparser works in those environments yet, but you've never stated what 
version of Python you're running, so I'm just taking a stab in the dark! On 
every version of Python on my system (Python 2.4 through 3.1), an `assert` that 
evaluates as False raises an AssertionError, which is caught and converted to 
an AttributeError.

I hope that's helpful to you.

Original comment by kurtmckee on 18 Dec 2010 at 7:49

GoogleCodeExporter commented 9 years ago
I figured out what is really going on. The code is at least borderline 
problematic. If you check the implementation of __getattr__, you will see that 
if AssertError is triggered, because it is an exception, it will be caught by 
the except statement and then converted to an AttributeError statement. This is 
why I only see the assertion failure in the WingIDE, which will stop the 
debugger on AssertError no matter it is handled or not. 

If not anything, the way it is written makes the assert statement useless. 

If the assertion is meant to raise an AttributeError exception in the end, why 
not just write:

if key.startswith('_'): print "Assertion should be triggerred": 
    raise AttributeError ...

If the assertion is meant to warn the programmer if the violation of an 
invariant that should never happen in the way FeedParser uses FeedParserDict(), 
then it certain has lost its purpose most of the time (unless used in a 
debugger like Wing).

Your test case is perfect to demonstrate the problem. Change the __get_attr__ 
to the following temporarily and run getattr(a, '__deepcopy__') and see how 
AssertError gets translated to AttributeError without warning. 

    def __getattr__(self, key):
        try:
            return self.__dict__[key]
        except KeyError:
            pass
        try:
        if key.startswith('_'): print "Assertion should be triggerred but eaten by the 'except' stmt"
        assert not key.startswith('_')

            return self.__getitem__(key)
        except:
            raise AttributeError, "object has no attribute '%s'" % key

Original comment by zyzhu2...@gmail.com on 19 Dec 2010 at 4:40

GoogleCodeExporter commented 9 years ago
BTW: I am using cpython 2.6 and 2.7.

Assert is meant to test an invariant of an algorithm. It is supposed to catch 
things that are mathematically impossible. It cannot be used to manipulate the 
outcome of the code. Otherwise what happens if the code is run with the option 
-O, in which case the assert statment is compiled away?

In this particular case, it is not really a problem because 
self.__getitem__(key) will trigger a different exception even if the assert 
statement is not there, but this is quite confusing. 

Original comment by zyzhu2...@gmail.com on 19 Dec 2010 at 5:27

GoogleCodeExporter commented 9 years ago
@zyzhu2000: Well said. Feedparser is no longer supporting versions of Python 
older than 2.4, so when I go through and revise the ancient UserDict vs. dict 
code, I'll be sure to remove the assert-related code. Thanks for replying back 
with such a considered response!

Original comment by kurtmckee on 19 Dec 2010 at 5:55

GoogleCodeExporter commented 9 years ago
FWIW, I've just hit this bug with the combination of Python 2.7 and WingIDE. 
What's weird is that I've been running my code from within Wing for over a year 
without this problem. I agree that assert() not right here.

Original comment by charlie....@clark-consulting.eu on 2 Nov 2011 at 8:31

GoogleCodeExporter commented 9 years ago
Fixed in r540 (way back in April, actually!).

I'm glad you came back to remind me of this issue; it was closed as Invalid 
before I was given the ability to modify issues in the tracker, so it never got 
reopened. I'm working to make a new release soon, but in the meantime you can 
download the current code from svn trunk:

https://feedparser.googlecode.com/svn/trunk/feedparser/feedparser.py

Original comment by kurtmckee on 3 Nov 2011 at 4:09