senjuhashirama / pugixml

Automatically exported from code.google.com/p/pugixml
0 stars 0 forks source link

xml_buffered_writer::text_output_escaped don't code entity ' #182

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Method xml_buffered_writer::text_output_escaped process only 4 standart entity: 
amp, lt, gt, quot. But entity ' remains not processed.

version 1.2

the proposed changes for version 1.2 applied

Original issue reported on code.google.com by Pot...@gmail.com on 4 Oct 2012 at 8:22

Attachments:

GoogleCodeExporter commented 9 years ago
This is correct.
Is there a reason why text_output_escaped should escape ' to '?

Original comment by arseny.k...@gmail.com on 4 Oct 2012 at 3:18

GoogleCodeExporter commented 9 years ago
The reasons I think:
1. The XML standard 1.0 (Fifth Edition) describes a set of predefined entities 
as (amp, lt, gt, apos, and quot). For example: 
http://www.w3.org/TR/2008/REC-xml-20081126/
2. Some XML editors believe unshielded apostrophe mistake. However, I can't 
remember what exactly is the editor thinks so. :-(
3. The standard allows surround values apostrophe instead of quotation marks. 
If the value is present apostrophe, it will be ambiguity.
4. Method strconv_escape decodes sequence. So text_output_escaped must perform 
the reverse conversion.

All of the above only IMHO.

Original comment by Pot...@gmail.com on 5 Oct 2012 at 8:25

GoogleCodeExporter commented 9 years ago
While ' is certainly an allowed entity, pugixml tries to preserve the text that 
can be preserved as is, without encoding it. For example, while it is possible 
to encode non-ASCII characters as escape sequences, pugixml chooses not to do 
so so that localized text is left as is.

The standard does allow to surround the attribute value with apostrophes; 
however, in this case you have to encode apostrophes but can choose to leave 
quotation marks as is. pugixml does not have an option to surround attribute 
values with apostrophes during writing yet.

In short, outputting unescaped apostrophes is perfectly compliant with XML 
standard; any tool that does not recognize this violates the standard. If you 
have an example of a tool or a library that does not work with whatever pugixml 
outputs, please tell me and I'll reopen the issue; otherwise, I'd prefer to 
leave this as it is.

Note that this might change when/if pugixml starts supporting 
apostrophe-surrounded attribute values during printing.

Original comment by arseny.k...@gmail.com on 11 Oct 2012 at 4:43