jsumners / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

How to monkey patch to get key value pairs when the parent element is different but child elements are namespaced the same #430

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

I have (Word Press) namespaced markup that looks like:

<wp:postmeta>
  <wp:meta_key>_edit_last</wp:meta_key>
  <wp:meta_value><![CDATA[6]]></wp:meta_value>
</wp:postmeta>

I can retrieve this fine with my current monkey patching. However, I also have 
blocks like this:

<wp:commentmeta>
  <wp:meta_key>akismet_result</wp:meta_key>
  <wp:meta_value><![CDATA[false]]></wp:meta_value>
</wp:commentmeta>

This replaces my values for wp_meta_key and wp_meta_value because my monkey 
patching doesn't take into account the parent elements (in this case, 
wp_postmeta and wp_commentmeta respectively).

I know how to monkey patch. I don't know how to determine the parent element 
and assign the key value pairs to specific parent elements. Please advise.

What is the expected output? What do you see instead?

I see the last entry of my wp_postmeta object (created by my monkey patches) 
replaced with the key and value of the entry of the wp_commentmeta object.

What I want to see is:

wp_postmeta = [{'wp_meta_value': u'6', 'wp_meta_key': u'_edit_last'}]
wp_commentmeta = [{'wp_meta_value': u'false', 'wp_meta_key': u'akismet_result'}]

What I end up seeing is:

wp_postmeta = [{'wp_meta_value': u'false', 'wp_meta_key': u'akismet_result'}]

What version of the product are you using? On what operating system?

5.1.3, OS X Yosemite, Python 2.7.5

Please provide any additional information below.

I can provide additional info as needed. What I really need (and have asked for 
before in June, 2014) is a guide on monkey patching best practices. Right now 
it is just trial and error. Looking at the source doesn't give much of a guide 
as it is not commented sufficiently with explanations as to what does what. 
Thanks in advance.

Original issue reported on code.google.com by robertln...@gmail.com on 20 Oct 2014 at 11:20

GoogleCodeExporter commented 9 years ago
I have figured this out. After studying the source code, in particular the 
author sections, I noticed that self.in<varname> were set on _start blocks that 
had the same child node keys, and then removed in _end blocks. Then you can 
test for the existence of those self.in<varname> in the child element 
processing and using a simple if/else you can add the values to your context in 
the correct place. Here is the code I ended up with. Feedback welcome.

def _start_wp_postmeta(self, attrsD):
    context = self._getContext()
    self.inpostmeta = 1
    context.setdefault('wp_postmeta', [])
    context['wp_postmeta'].append(feedparser.FeedParserDict())

def _end_wp_postmeta(self):
    self.inpostmeta = 0

def _start_wp_commentmeta(self, attrsD):
    context = self._getContext()
    self.incommentmeta = 1
    context.setdefault('wp_commentmeta', [])
    context['wp_commentmeta'].append(feedparser.FeedParserDict())

def _end_wp_commentmeta(self):
    self.incommentmeta = 0

def _start_wp_meta_key(self, attrsD):
    context = self._getContext()
    context.setdefault('wp_meta_key', [])
    self.push('wp_meta_key', 1) # new
    context['wp_meta_key'].append(attrsD)

def _end_wp_meta_key(self):
    wp_meta_key = self.pop('wp_meta_key')
    context = self._getContext()
    if self.inpostmeta:
        context['wp_postmeta'][-1]['wp_meta_key'] = wp_meta_key
    elif self.incommentmeta:
        context['wp_commentmeta'][-1]['wp_meta_key'] = wp_meta_key

def _start_wp_meta_value(self, attrsD):
    context = self._getContext()
    context.setdefault('wp_meta_value', [])
    self.push('wp_meta_value', 1) # new
    context['wp_meta_value'].append(attrsD)

def _end_wp_meta_value(self):
    wp_meta_value = self.pop('wp_meta_value')
    context = self._getContext()
    if self.inpostmeta:                                                                                                                                                                          
        context['wp_postmeta'][-1]['wp_meta_value'] = wp_meta_value
    elif self.incommentmeta:
        context['wp_commentmeta'][-1]['wp_meta_value'] = wp_meta_value

feedparser._FeedParserMixin._start_wp_postmeta = _start_wp_postmeta
feedparser._FeedParserMixin._end_wp_postmeta = _end_wp_postmeta
feedparser._FeedParserMixin._start_wp_commentmeta = _start_wp_commentmeta
feedparser._FeedParserMixin._end_wp_commentmeta = _end_wp_commentmeta

feedparser._FeedParserMixin._start_wp_meta_key = _start_wp_meta_key
feedparser._FeedParserMixin._end_wp_meta_key = _end_wp_meta_key
feedparser._FeedParserMixin._start_wp_meta_value = _start_wp_meta_value
feedparser._FeedParserMixin._end_wp_meta_value = _end_wp_meta_value

Original comment by robertln...@gmail.com on 21 Oct 2014 at 4:58

GoogleCodeExporter commented 9 years ago
Copy and paste went a bit nutso in the last method. Here it is again:

def _end_wp_meta_value(self):
    wp_meta_value = self.pop('wp_meta_value')
    context = self._getContext()
    if self.inpostmeta:    
        context['wp_postmeta'][-1]['wp_meta_value'] = wp_meta_value
    elif self.incommentmeta:
        context['wp_commentmeta'][-1]['wp_meta_value'] = wp_meta_value

Original comment by robertln...@gmail.com on 21 Oct 2014 at 5:01