lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
438 stars 36 forks source link

Atom summary/content is order-dependent #262

Open lemon24 opened 2 years ago

lemon24 commented 2 years ago

Seems like a feedparser issue: https://github.com/kurtmckee/feedparser/issues/59

Repro:

import reader, io, feedparser

feed_bytes = b"""\
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <entry>
        <id>one</id>
        <summary>summary-one</summary>
        <content type="html">content-one</content>
    </entry>
    <entry>
        <id>two</id>
        <content type="html">content-two</content>
        <summary>summary-two</summary>
    </entry>
</feed>
"""

parser = reader._parser.default_parser().get_parser_by_mime_type('application/atom+xml')

feed, entries = parser('url', io.BytesIO(feed_bytes))
for entry in entries:
    print(entry.id)
    print(' ', 'summary', entry.summary)
    print(' ', 'content')
    for content in entry.content:
        print('   ', content)

print()

for entry in feedparser.parse(io.BytesIO(feed_bytes)).entries:
    print(entry.id)
    print(' ', 'summary', entry.summary)
    print(' ', 'content')
    for content in entry.content:
        print('   ', content)

Output:

one
  summary summary-one
  content
    Content(value='content-one', type='text/html', language=None)
two
  summary content-two
  content
    Content(value='content-two', type='text/html', language=None)
    Content(value='summary-two', type='text/plain', language=None)

one
  summary summary-one
  content
    {'type': 'text/html', 'language': None, 'base': '', 'value': 'content-one'}
two
  summary content-two
  content
    {'type': 'text/html', 'language': None, 'base': '', 'value': 'content-two'}
    {'type': 'text/plain', 'language': None, 'base': '', 'value': 'summary-two'}