Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
I use htmlparser, but it uses parsexml under the hood.
I need to parse this wild html: <a href="&" class="CCC">TTT</a>
in other words: href attribute may contain URL with ampersand (just char inside URL, and not entity like &)
but after parsing i get: <a href="&">class="CCC">TTT</a>.
notice how class attribute is not an attribute anymore it is now inside innerText.
what happens:
parsexml.nim sets stateAttr to parse attributes, then outputs error (1, 12) Error: ';' expected because it can't parse entity and sets state to stateError and then to stateNormal. Next attribute will not be parsed as attribute, because state is not stateAttr anymore!
minified example:
import pkg/htmlparser, xmltree, streams
var errors: seq[string]
let node = newStringStream("""<a href="&" class="CCC">TTT</a>""").parseHtml("",errors)
echo node
echo errors
Nim Version
Nim Compiler Version 2.1.1 [Windows: amd64]
Compiled at 2023-11-19
Copyright (c) 2006-2023 by Andreas Rumpf
git hash: cecaf9c56b1240a44a4de837e03694a0c55ec379
active boot switches: -d:release
XML (and HTML) does not seem to allow raw & in attribute strings (consider "), but maybe we could adjust the failing condition to leave the & verbatim.
Description
I use
htmlparser
, but it usesparsexml
under the hood. I need to parse this wild html:<a href="&" class="CCC">TTT</a>
in other words: href attribute may contain URL with ampersand (just char inside URL, and not entity like&
)but after parsing i get:
<a href="&">class="CCC">TTT</a>
. notice how class attribute is not an attribute anymore it is now inside innerText.what happens: parsexml.nim sets stateAttr to parse attributes, then outputs error
(1, 12) Error: ';' expected
because it can't parse entity and sets state tostateError
and then tostateNormal
. Next attribute will not be parsed as attribute, because state is not stateAttr anymore!minified example:
Nim Version
Nim Compiler Version 2.1.1 [Windows: amd64] Compiled at 2023-11-19 Copyright (c) 2006-2023 by Andreas Rumpf
git hash: cecaf9c56b1240a44a4de837e03694a0c55ec379 active boot switches: -d:release
Current Output
Expected Output
Possible Solution
No response
Additional Information
No response