libo26 / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Error parsing title in http://sports.yahoo.com/nba/rss.xml #291

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Title does not appear after parsing that feed (I'm using 5.0.1)

Original issue reported on code.google.com by svil...@gmail.com on 22 Jun 2011 at 5:30

GoogleCodeExporter commented 9 years ago
The same happens for me with the feed at:

http://www.theskepticsguide.org/feed/rss.aspx?feed=SGU

Original comment by th.perl@gmail.com on 8 Aug 2011 at 9:58

GoogleCodeExporter commented 9 years ago
After some more investigation, it turns out that the problem could be related 
to the <image> section in the feed with a <title> tag that comes before the 
document's <title> tag. If I take the feed in comment 1 and move the <title> 
tag before the <image> tag, the title does appear:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" 
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
  <channel>
    <atom:link href="http://www.theskepticsguide.org/feed/rss.aspx?feed=sgu" rel="self" type="application/rss+xml" />
    <title>The Skeptics' Guide to the Universe</title>
    <image>
      <url>http://www.theskepticsguide.org/images/logoSGU.png</url>
      <title>The Skeptics Guide</title>
      <link>http://www.theskepticsguide.org</link>
      <width>144</width>
      <height>144</height>
    </image>
    [...]

However, having the <title> after the <image> makes the title not available 
using feedparser:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" 
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
  <channel>
    <atom:link href="http://www.theskepticsguide.org/feed/rss.aspx?feed=sgu" rel="self" type="application/rss+xml" />
    <image>
      <url>http://www.theskepticsguide.org/images/logoSGU.png</url>
      <title>The Skeptics Guide</title>
      <link>http://www.theskepticsguide.org</link>
      <width>144</width>
      <height>144</height>
    </image>
    <title>The Skeptics' Guide to the Universe</title>
    [...]

Also, removing the <title> tag inside the <image> is enough to make this work 
again, so it is definitely related to <title> appearing in <image> (the 
following does provide the title):

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" 
xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
  <channel>
    <atom:link href="http://www.theskepticsguide.org/feed/rss.aspx?feed=sgu" rel="self" type="application/rss+xml" />
    <image>
      <url>http://www.theskepticsguide.org/images/logoSGU.png</url>
      <link>http://www.theskepticsguide.org</link>
      <width>144</width>
      <height>144</height>
    </image>
    <title>The Skeptics' Guide to the Universe</title>
    [...]

Original comment by th.perl@gmail.com on 8 Aug 2011 at 10:04

GoogleCodeExporter commented 9 years ago
Proposed patch against the Git master branch of 
https://github.com/kurtmckee/feedparser attached.

Original comment by th.perl@gmail.com on 8 Aug 2011 at 10:25

Attachments:

GoogleCodeExporter commented 9 years ago
Fixed in 595. I committed a patch for issue 298, and it also fixed this issue.

Original comment by kurtmckee on 16 Sep 2011 at 6:36

GoogleCodeExporter commented 9 years ago
Great thx!

Original comment by svil...@gmail.com on 16 Sep 2011 at 7:47