weihe08 / wikimodel

Automatically exported from code.google.com/p/wikimodel
0 stars 0 forks source link

WikiEntityUtil - Apostrophe rendered as ’ instead of ' #12

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
WikiEntityUtil translates the "'" (apostrophe) character to the entity
"’".  This entity is not recognized by the SAX parser if you feed it
the HTML generated by WikiModel.  I believe that WikiEntityUtil should be
changed to map this character to "'" which the SAX parser handles
correctly.

Original issue reported on code.google.com by dannylev...@gmail.com on 8 Jan 2008 at 4:41

GoogleCodeExporter commented 9 years ago
This is one possible solution. We can replace all XHTML entities by the 
corresponding
digital codes. In this case all elements will be recognized by XML parsers. 
Right now
the XHTML output can not be parsed as is: many entities are not defined in the 
XML
header (but they are recognized as such by browsers).
Example of entities to replace:
{{{
 
"
«
»
©
...
}}}

The full list of entities to replace see the 
org.wikimodel.wem.util.WikiEntityUtil class.

(http://wikimodel.googlecode.com/svn/trunk/org.wikimodel.wem/src/main/java/org/w
ikimodel/wem/util/WikiEntityUtil.java)

Original comment by mikhail....@gmail.com on 10 Jan 2008 at 3:37

GoogleCodeExporter commented 9 years ago
If you think this is a valid solution, it sounds good to me.

Original comment by dannylev...@gmail.com on 10 Jan 2008 at 4:31

GoogleCodeExporter commented 9 years ago
Actually, I just realized what you are saying here.  Is there some way to 
pre-define
the common ones used in HTML like:

{{{
 
<
>
&
"
'
}}}

It would by nice if the parser would handle these common character entities.

Original comment by dannylev...@gmail.com on 11 Jan 2008 at 1:07

GoogleCodeExporter commented 9 years ago
I just tried and the standard all of the above are already handled except for 
 .
 I have searched information on SAX parsers and the only way I have found to add
entities is to define them in a dtd.  This means that the input xml must have a 
dtd
defined.  Even then, I'm not sure if this means that DTD validation must be 
turned on
for the SAX parser to recognize the character entities.

Original comment by dannylev...@gmail.com on 11 Jan 2008 at 5:29

GoogleCodeExporter commented 9 years ago
Danny, I think I've fixed this some time ago by adding the XHTML DTDs to the 
XHTML
parser. Could you try again and let me know if all is working for you?

Thanks

Original comment by vmas...@gmail.com on 26 Oct 2008 at 1:52