jsumners / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

When <summary> follows <content>, summary_detail is not set #412

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This shows up in current HEAD and in 5.1.3.

The two trivial Atom 1.0 feeds attached are the same, except that the order of 
the <content> and <summary> elements is swapped.

When <summary> comes first, I get the result I would expect: feedparser returns 
<summary> in 'summary' and 'summary_detail', and <content> in 'content'.

When <content> comes first, however, feedparser puts <content> into 'summary', 
puts both <content> and <summary> into 'content', and doesn't set 
'summary_detail' at all...

Looking at the code, this is because _start_summary looks to see if the current 
entry already has a 'summary' field set (which it does, because _end_content 
set it), and if so, saves the summary into 'content'. I'm not sure why that's a 
useful thing to do, but there are a couple of tests that make sure it does it 
(wellformed/rss/item_summary_and_description.xml and 
wellformed/rss/item_description_and_summary.xml) -- is there a historical 
reason for wanting this behaviour with the nonstandard <summary> element in 
RSS? (And if so, can it be disabled for Atom?)

Original issue reported on code.google.com by ats-goog...@offog.org on 31 Jul 2013 at 3:25

Attachments: