haskell-party / feed

Haskell package for handling various feed (RSS) formats.
Other
30 stars 22 forks source link

Atom validation cannot recognize "type" attribute on content #44

Closed koterpillar closed 4 years ago

koterpillar commented 4 years ago

Here's a fairly minimal example:

{-# LANGUAGE OverloadedStrings #-}

import Text.Atom.Feed
import Text.Atom.Feed.Export
import Text.Atom.Feed.Validate

import Data.XML.Types

html = Element { elementName = "html"
               , elementAttributes = []
               , elementNodes = []
               }

article' = nullEntry "http://example.com" (TextString "example") "2000-01-01T00:00:00Z"

article = article' { entryContent = Just $ HTMLContent html
                   , entryAuthors = [nullPerson { personName = "Nobody" }] 
                   }

main = print $ flattenT $ validateEntry $ xmlEntry article

Expected result:

No errors.

Actual result:

[(True,"content with type 'text' cannot have child elements, text only.")]
jkeuhlen commented 4 years ago

Hey @koterpillar thanks for the report!

I think this is the desired behavior, but maybe I'm missing something? From RFC4287:

3.1.1.1.  Text

   Example atom:title with text content:

   ...
   <title type="text">
     Less: &lt;
   </title>
   ...

   If the value is "text", the content of the Text construct MUST NOT
   contain child elements.  Such text is intended to be presented to
   humans in a readable fashion.  Thus, Atom Processors MAY collapse
   white space (including line breaks) and display the text using
   typographic techniques such as justification and proportional fonts.

Could you maybe provide some more details about what you're trying to do, or how you think the library is failing?

In this case, you didn't specify any type fields, and so text was assumed, and then you added children to it, which is not allowed. nullEntry doesn't specify any attributes and you can find the logic for defaulting here:

    types = pAttrs "type" e
    (ty, type_valid) =
      case types of
        [] -> ("text", valid)
koterpillar commented 4 years ago

Please note that the content inside my entry is HTMLContent, not TextContent! xmlEntry puts a type="html" attribute in, however, that attribute is not using the Atom namespace.

jkeuhlen commented 4 years ago

Can you paste in the XML result from xmlEntry? I'm still not seeing an issue. At the top level, you have a TextContent because no type is specified. Then you have an HTML content underneath that, but the type attribute is likely underneath the text as well.

koterpillar commented 4 years ago

I'm trying to have HTML content and I thought putting HTMLContent in there would be enough.

Here's the updated example showing the XML:

#!/usr/bin/env stack
-- stack --resolver lts-14.25 script --package feed --package xml-types --package xml-conduit --package text
{-# LANGUAGE OverloadedStrings #-}

import Text.Atom.Feed
import Text.Atom.Feed.Export
import Text.Atom.Feed.Validate

import Data.XML.Types

import qualified Text.XML as C

import qualified Data.Text as Text
import qualified Data.Text.IO as Text
import qualified Data.Text.Lazy as LazyText

html = Element { elementName = "html"
               , elementAttributes = []
               , elementNodes = []
               }

article' = nullEntry "http://example.com" (TextString "example") "2000-01-01T00:00:00Z"

article = article' { entryContent = Just $ HTMLContent html
                   , entryAuthors = [nullPerson { personName = "Nobody" }]
                   }

xmlArticle = xmlEntry article

(Right xmlDocument) = C.fromXMLDocument $ Document (Prologue [] Nothing []) xmlArticle []

main = do
  Text.putStrLn $ LazyText.toStrict $ C.renderText C.def xmlDocument
  print $ flattenT $ validateEntry xmlArticle

And the output (edited to fit better):

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom">
  <id>http://example.com</id>
  <title type="text">example</title>
  <updated>2000-01-01T00:00:00Z</updated>
  <author><name>Nobody</name></author>
  <content type="html"><html xmlns=""/></content>
</entry>
[(True,"content with type 'text' cannot have child elements, text only.")]

As you can see, the content element has type="html".

jkeuhlen commented 4 years ago

Hmm okay yeah it does seem like something is funny here. I'm not seeing anything obviously wrong at first glance though. I'll have to play with it some more to diagnose the issue. If you have more information/examples, please let me know! I'm not super familiar with this library, but jumped in to help maintain since I'm using it for a project. Ideally, we should have a test that creates an entry, turns it into XML, and then parses that XML back without losing anything.

koterpillar commented 4 years ago

Here's my investigation:

Validation eventually calls pAttrs on the content element.

Note that for some reason pAttrs and pAttr work differently, with pAttr accepting attributes without a namespace and pAttrs ignoring those.

Also, atomAttr (which is used to generate the type attribute) is again different from atomName in its treatment of namespaces.

I'm not that familiar with the XML spec, but it seems that either pAttrs or atomAttr need to be changed to account for the Atom namespace.