senjuhashirama / pugixml

Automatically exported from code.google.com/p/pugixml
0 stars 0 forks source link

Encoding #214

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

In my XML string i am using some swedish characters. The XML is loaded ok. but 
when i try to get attribute vale(containing swedish character) i am not getting 
proper output. 
For loading XML string:
pugi::xml_parse_result  pResult = m_XmlDocument.load_buffer(xmlData.c_str(), 
xmlData.size() * sizeof(wchar_t), pugi::parse_default | pugi::parse_escapes, 
pugi::encoding_wchar);   

and attribute value in XML string is "FÖKJ33" but i am getting different 
swedish character. Is there any solution? 

Kshitija.

Original issue reported on code.google.com by kshitija...@gmail.com on 13 Sep 2013 at 12:57

GoogleCodeExporter commented 9 years ago
Please provide more details:

1. Are you defining PUGIXML_WCHAR_MODE or not?
2. Please specify the exact attribute value output (an integer code for each 
byte or wchar_t of the string) that pugixml returns

Original comment by arseny.k...@gmail.com on 17 Sep 2013 at 3:41

GoogleCodeExporter commented 9 years ago
Now i have defined PUGIXML_WCHAR_MODE and my problem is solved.Previously it 
was not defined.
Thanks.
Kshitija

Original comment by kshitija...@gmail.com on 17 Sep 2013 at 5:45

GoogleCodeExporter commented 9 years ago
Note that if PUGIXML_WCHAR_MODE is *not* defined, pugixml works in UTF-8 - so 
your string, "FÖKJ33", corresponds to the following array of codepoints:
[70, 214, 75, 74, 51, 51]

When encoded to UTF-8, it corresponds to
[70, 195, 150, 75, 74, 51, 51]

Which is what pugixml will return you. Then you must treat the string as UTF-8, 
or use pugi::as_wide() to go from UTF-8 to wchar-t as necessary.

Alternatively, if PUGIXML_WCHAR_MODE *is* defined, internal storage uses 
wchar_t, so you can just work with wide strings.

Original comment by arseny.k...@gmail.com on 17 Sep 2013 at 5:57

GoogleCodeExporter commented 9 years ago

Original comment by arseny.k...@gmail.com on 17 Sep 2013 at 5:58

GoogleCodeExporter commented 9 years ago
Can you give an example how to use pugi::as_wide()?

Original comment by kshitija...@gmail.com on 17 Sep 2013 at 6:41

GoogleCodeExporter commented 9 years ago
Suppose you compile pugixml without PUGIXML_WCHAR_MODE (otherwise as_wide is 
not really necessary).

Then:

std::wstring data = L"<root value='FÖKJ33' />";

xml_document doc;
doc.load_buffer(data.c_str(), data.size() * sizeof(wchar_t), 
pugi::parse_default, pugi::encoding_wchar);

std::wstring value = 
pugi::as_wide(doc.child("root").attribute("value").value());

Here's an equivalent example with PUGIXML_WCHAR_MODE:

std::wstring data = L"<root value='FÖKJ33' />";

xml_document doc;
doc.load_buffer(data.c_str(), data.size() * sizeof(wchar_t), 
pugi::parse_default, pugi::encoding_wchar);

std::wstring value = doc.child(L"root").attribute(L"value").value();

Original comment by arseny.k...@gmail.com on 17 Sep 2013 at 6:53

GoogleCodeExporter commented 9 years ago
Thanks. using pugi::as_wide() also it is working.

Original comment by kshitija...@gmail.com on 17 Sep 2013 at 6:57

GoogleCodeExporter commented 9 years ago

Original comment by arseny.k...@gmail.com on 17 Sep 2013 at 2:37