tfussell / xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++11+
Other
1.43k stars 402 forks source link

weirdness with unicode characters #696

Open affect opened 1 year ago

affect commented 1 year ago

I have an xlsx file (in attach) with some greek text in some cells. (I reduced the file to just a single cell)

I want to extract the cell value and write it to a txt file (actually an XML file but that is another issue)

however, all that text comes out completely garbled... My initial attempt was with a simple std::ofstream. Then I tried with a wofstream which also didn't work. I noticed, with the Devstudio debugger, that even when reading cell values into std::string variables, the content of that variable was not correct.

using

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter1;

and then

`out = converter1.from_bytes(in);`

the variable out does contain the proper greek text. But then, when writing that to file, the std::wofstream stops at that character

is there anyone who can point me in the right directtion how to extract values and write that to a unicode textfile ?

greek.xlsx

Beaky2000 commented 1 year ago

I made a quick test that reads your file and verifies the contents to be correct, so it looks like your problem of writing a UTF-8 encoded string to a file is not specific to xlnt.

This is the (Visual Studio) test I wrote (which passes):

TEST_METHOD(Greek)
{
    constexpr wchar_t const excel_file[] = TEST_DIR "testData\\Greek.xlsx";
    xlnt::workbook excel_wb;
    try {
        excel_wb.load(excel_file);
    }
    catch (...)
    {
        Assert::Fail();
    }
    Assert::AreEqual(1, (int)excel_wb.sheet_count());
    xlnt::worksheet const ws{ excel_wb.sheet_by_index(0) };

    auto A1 = xlnt::cell_reference{ 1, 1 };
    Assert::IsTrue(ws.has_cell(A1));
    xlnt::cell A1_cell{ ws.cell(A1) };
    Assert::AreEqual(reinterpret_cast<char const *>(u8"εἰς"), A1_cell.to_string().c_str());
}