libo26 / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

nested nodes clobber actual values #298

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1. Create an entry with an id node (for example), and another node (A) which 
itself has an id node.
2. Try to access the entry's id

You would expect that the value returned would be the value of the node that is 
a child of the entry node.  In fact, it's reliant on source order.  If the 
descendant id node comes before the child id node in the source, its value will 
be returned.  Of course, this isn't unique to ids.  The parser simply overrides 
all of the entry's properties when there are multiple nodes with the same 
name…no matter what their depth.

The correct behavior would be that the depth of the node would be taken into 
account.  I guess that nested nodes should still be added as properties to the 
entry (so as not to break backwards compatibility—not sure if there's an 
actual use case), but those with lower depth should be given precedence.

Attached is a patch which does just that.  The parsers keep track of which 
values were parsed at which depths and (when the document has been completely 
parsed) set the appropriate values on the FeedParserDict.

Issues 270 and 292 are side-effects of this bug.

Original issue reported on code.google.com by matthewt...@gmail.com on 19 Jul 2011 at 5:38

Attachments:

GoogleCodeExporter commented 9 years ago
I had an opportunity to pull your changes, but doing so caused a lot of test 
failures and errors. Did you have an opportunity to run the tests after 
creating the patch? If they all pass, would you provide more details about your 
Python version and setup? Would you also add some test cases that demonstrate 
poor behavior before your patch is applied, and better behavior afterward?

Original comment by kurtmckee on 2 Sep 2011 at 6:22

GoogleCodeExporter commented 9 years ago
Yeah, you're right. I see what's going on. My patch defers the resolution of 
property values until the entire document is read, and the `_end_*` functions 
require the values after the node is read. It'll try to update my github pull 
request this weekend so that it passes the tests and add another test case that 
illustrates the current failure (nested nodes with the same name as entry-level 
nodes). (I've deleted the patch attachment on this issue so as not to cause 
confusion—the github pull request is the more recent.)

Original comment by matthewt...@gmail.com on 2 Sep 2011 at 3:03

GoogleCodeExporter commented 9 years ago
Alright, sorry about that. Sheer laziness on my part. Tests and patch submitted 
on github.

Original comment by matthewt...@gmail.com on 3 Sep 2011 at 4:33

GoogleCodeExporter commented 9 years ago
I just pulled from your repo and tried running the unit tests again, and 
defaultdict isn't available in 2.4 (my script tests 2.7 without BeautifulSoup 
first as a sanity check, then jumps back to 2.4 and increments versions through 
3.2). :(

Can you replace defaultdict?

Original comment by kurtmckee on 15 Sep 2011 at 4:05

GoogleCodeExporter commented 9 years ago
Ah sorry. I thought I read in some old discussion that feedparser was already 
2.5+. Fixed.

Original comment by matthewt...@gmail.com on 15 Sep 2011 at 4:45

GoogleCodeExporter commented 9 years ago
Nice! All the tests pass. I have to get to work, but later I'm going to take a 
hard look at whether there will ever be a chance that hashing a dict will come 
back to bite us (my initial glance at the diff looks like it won't be an 
issue). Aside from that concern, the patch looks solid! Thanks for your work!

Original comment by kurtmckee on 15 Sep 2011 at 5:14

GoogleCodeExporter commented 9 years ago
Fixed in r595. Thanks for all the hard work!

Original comment by kurtmckee on 16 Sep 2011 at 6:25

GoogleCodeExporter commented 9 years ago
Issue 291 has been merged into this issue.

Original comment by kurtmckee on 16 Sep 2011 at 6:36

GoogleCodeExporter commented 9 years ago
No problem. Thank you!

Out of curiosity, is there a reason you merged the change manually instead of 
with git?  It's not a big deal, but since the commits never got merged, our 
discussion (comments on them) were lost when I deleted my fork.

Original comment by matthewt...@gmail.com on 16 Sep 2011 at 1:57

GoogleCodeExporter commented 9 years ago
Yep! It ultimately had to get pushed into the project's garbage svn repository, 
and authorship information would be lost (svn assumes the committer and author 
are one and the same). I did use git, but I squashed all of the commits into 
one and updated the NEWS file simultaneously. That also let me modify the 
commit message so I could give you credit for the work.

I intend to ask for more control over the project after the next release. As it 
stands, I can only commit and make changes to the issue tracker. I can't even 
modify the front page! *laughs* I'd love to switch the repo to git, which 
Google Code now supports, but I'll need more project permissions to do that.

Original comment by kurtmckee on 16 Sep 2011 at 2:26

GoogleCodeExporter commented 9 years ago
Wow I didn't realize you didn't have full project permissions!

I don't care so much about the ownership of the commits…I was just a little 
bummed to find out that my descriptions of them (and our discussions) were 
lost.  Oh well.  Thanks again!

Original comment by matthewt...@gmail.com on 20 Sep 2011 at 4:01

GoogleCodeExporter commented 9 years ago
Issue 308 has been merged into this issue.

Original comment by kurtmckee on 30 Nov 2011 at 6:58