Open GoogleCodeExporter opened 9 years ago
This is one possible solution. We can replace all XHTML entities by the
corresponding
digital codes. In this case all elements will be recognized by XML parsers.
Right now
the XHTML output can not be parsed as is: many entities are not defined in the
XML
header (but they are recognized as such by browsers).
Example of entities to replace:
{{{
"
«
»
©
...
}}}
The full list of entities to replace see the
org.wikimodel.wem.util.WikiEntityUtil class.
(http://wikimodel.googlecode.com/svn/trunk/org.wikimodel.wem/src/main/java/org/w
ikimodel/wem/util/WikiEntityUtil.java)
Original comment by mikhail....@gmail.com
on 10 Jan 2008 at 3:37
If you think this is a valid solution, it sounds good to me.
Original comment by dannylev...@gmail.com
on 10 Jan 2008 at 4:31
Actually, I just realized what you are saying here. Is there some way to
pre-define
the common ones used in HTML like:
{{{
<
>
&
"
'
}}}
It would by nice if the parser would handle these common character entities.
Original comment by dannylev...@gmail.com
on 11 Jan 2008 at 1:07
I just tried and the standard all of the above are already handled except for
.
I have searched information on SAX parsers and the only way I have found to add
entities is to define them in a dtd. This means that the input xml must have a
dtd
defined. Even then, I'm not sure if this means that DTD validation must be
turned on
for the SAX parser to recognize the character entities.
Original comment by dannylev...@gmail.com
on 11 Jan 2008 at 5:29
Danny, I think I've fixed this some time ago by adding the XHTML DTDs to the
XHTML
parser. Could you try again and let me know if all is working for you?
Thanks
Original comment by vmas...@gmail.com
on 26 Oct 2008 at 1:52
Original issue reported on code.google.com by
dannylev...@gmail.com
on 8 Jan 2008 at 4:41