Closed xave closed 1 year ago
Unless I am mistaken, parseText
is designed to consume valid XML data, which the string Apples & Oranges
is not. See specification:
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively.
It seems to me that what you want is a pre-processing function that escapes &
(and other special characters) before feeding the string to parseText
. Example implementations (not type-checked, not tested):
specific to &
, less efficient:
import Data.Text (Text, replace)
escape :: Text -> Text
escape = replace "&" "&"
applicable to all XML special characters, more efficient:
import Data.Default (def)
import Data.XML.Types (Content(..), Event(..))
import Text.XML.Stream.Render (renderText)
escape :: (PrimMonad m, MonadThrow m) => Text -> ConduitT () Text m ()
escape text = yield (EventContent $ ContentText text) .| renderText def
Does that fulfill your need ?
It does. Thank you very much!
The objective is to give
parseText
a string "Apples & Oranges" without escaping the ampersand.Using
does not work because ampersand is not an illegal character, thus failing to trigger the
psDecodeIllegalCharacters
.A potential workaround is to parse my string "Apples & Oranges" and replace it with something outside of the range such as
&#[0-9]+;
as in the docs forpsDecodeIllegalCharacters
.&#[0-9]+;
as an Int, where the output would be turned intoJust '&'