zendframework / zend-feed

Feed component from Zend Framework
BSD 3-Clause "New" or "Revised" License
166 stars 42 forks source link

No CDATA block in content block of atom feed #82

Open av3 opened 6 years ago

av3 commented 6 years ago

Hello,

I wanted to provide feeds (via the Writer of Zend Feed) with the full content of an article (including some HTML5 markup) and thought to prefer atom over rss. But the writer is acting different and causes some trouble for me.

My code:

    $entry = $feed->createEntry();
    $entry->setContent($news->getText);

Output for RSS:

    <item>
      <content:encoded><![CDATA[<p>My content ...</p>]]></content:encoded>

Output for Atom:

  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml">
        <xhtml:p>My content ...</xhtml:p>
      </xhtml:div>
    </content>
  </entry>

And if I add any image to it in HTML5-Style <img src="myimage.jpg"> instead of XHTML-Style <img src="myimage.jpg" />, I get a warning:

DOMDocument::loadXML(): Opening and ending tag mismatch: img line 1 and p in Entity, line: 1

In the atom example in the documentation there is the output:

        <content type="html">
            <![CDATA[I am not writing the article.
                     The example is long enough as is ;).]]>
        </content>

In _setDescription I found $dom->createCDATASection (Entry\Rss and Entry\Atom). But in Atom it's just the summary and in Rss the Content.

In Entry\Atom the _setContent is relevant for the content block, which I wanted to use to output the full content and not just a summary. And there I found $element->setAttribute('type', 'xhtml') in _setContent.

I doubt that the atom output of the example in the documentation is even possible with Zend Feed or am I wrong? It would be great, if the atom feed would also use the CDATA blockinstead of the xhtml for the content.

froschdesign commented 6 years ago

@av3 If you install the PHP extension "Tidy", then zend-feed will be converted your HTML to XHTML.

Example:

$tidy = new \tidy;
$tidy->parseString(
    '<p><img src="foo.jpg"></p>',
    [
        'output-xhtml'   => true,
        'show-body-only' => true,
        'quote-nbsp'     => false,
    ]
);
$tidy->cleanRepair();

var_dump((string) $tidy); // <p><img src="foo.jpg" /></p>

https://github.com/zendframework/zend-feed/blob/b3d847afc0830a0ca7841a8ecc409c175dfea49d/src/Writer/Renderer/Entry/Atom.php#L383-L396

av3 commented 6 years ago

Thanks for your reply, @froschdesign. With Tidy it's working, even if it's not very beautiful:

<content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
  <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:img src="myimage" />
    <xhtml:p>My content</xhtml:p>
  </xhtml:div>
</content>

But is this really necessary? Wouldn't it be better to use $dom->createCDATASection? Then we wouldn't need tidy to create the content section. Or is there a specific reason why _setDescription (of Rss and Atom) creates a CDATA section and _setContent does not?

But this would also mean that the atom output example of the documentation is wrong, right?

If I don't want that xhtml output: Would it be possible to write an own Writer Extension where I could overwrite the _setContent method? Are there any examples how to register own Writers? In the documentation there is just a "TODO" for that chapter.

froschdesign commented 6 years ago

With Tidy it's working, even if it's not very beautiful:

Why isn't it beautiful? The generated code works and is correct.

Then we wouldn't need tidy to create the content section.

The content of atom:content should be suitable for handling as HTML or XHTML - depending on the specified type. Tidy helps us here to meet the specifications.

But this would also mean that the atom output example of the documentation is wrong, right?

Right!

In the documentation there is just a "TODO" for that chapter.

Oh, this is a mistake. Thanks for the hint!

froschdesign commented 6 years ago

Please have a look at content:encoded: http://www.rssboard.org/rss-profile#namespace-elements-content-encoded

zend-feed also provides an extension for this element: Zend\Feed\Writer\Extension\Content\Renderer\Entry

The usage of the writer extensions are the same like described for the reader: https://docs.zendframework.com/zend-feed/reader/#extending-feed-and-entry-apis

av3 commented 6 years ago

Why isn't it beautiful? The generated code works and is correct.

Yes, (meanwhile) I know that it's correct with the XHTML. It looks unusual for me and I thought it could be better to provide the content without modification (faster and smaller size), but this isn't important for a feed.

Please have a look at content:encoded: http://www.rssboard.org/rss-profile#namespace-elements-content-encoded

There it says:

The content MUST be suitable for presentation as HTML and be encoded as character data in the same manner as the description element.

and in description it says:

HTML markup MUST be encoded as character data either by employing the HTML entities < ("<") and > (">") or a CDATA section.

No word about xhtml content, but I know that it's also a valid solution for Atom feeds. But when it says "same manner as the description element" and the _setDescription method of the Renderer\Entry\Atom is correct with its createCDATASection inside, it should also be suitable for _setContent. I'm just wondering, because for me it's not consistent.

zend-feed also provides an extension for this element

But this works only for RSS feeds, not for Atom.

The usage of the writer extensions are the same like described for the reader: https://docs.zendframework.com/zend-feed/reader/#extending-feed-and-entry-apis

Thank you, but unfortunately I wasn't successful with this. Writing my own Renderer\Entry with a _setContent called in the constructor would cause a second content:encoded block in my Atom feed if Tidy is enabled.

I addition to this I tried to write my own extension to optimize my feed for feedly. I tried to start with my own Writer\Feed class and add methods for an accentColor and registering the namespace. But there is no Zend\Feed\Writer\Extension\AbstractFeed. extending my class with Zend\Feed\Writer\AbstractFeed will cause an error:

…/vendor/zendframework/zend-feed/src/Writer/StandaloneExtensionManager.php40:

Maximum function nesting level of '256' reached, aborting!

Next attempt: Without an extend AbstractFeed caused another error:

…/vendor/zendframework/zend-feed/src/Writer/AbstractFeed.php845:

call_user_func_array() expects parameter 1 to be a valid callback, class 'Webfeed\Writer\Feed' does not have a method 'getItunesAuthors'

I don't know why it's checking for a method of the iTunes extension in my own extension. But okay, this is another problem. Maybe you (or someone else) could provide an "JungleBooks" extension example for the Writer in the documentation.

froschdesign commented 6 years ago

Sorry, the topic was Atom and not RSS. My mistake. 🤦‍♂️

No word about xhtml content, but I know that it's also a valid solution for Atom feeds.

See at the specification: https://tools.ietf.org/html/rfc4287#page-14

Maybe you (or someone else) could provide an "JungleBooks" extension example for the Writer in the documentation.

Maybe tomorrow. I will definitely give feedback.

froschdesign commented 6 years ago

@av3 An example for registering a writer extension can be found at #86

weierophinney commented 4 years ago

This repository has been closed and moved to laminas/laminas-feed; a new issue has been opened at https://github.com/laminas/laminas-feed/issues/7.