tfussell / xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++11+
Other
1.49k stars 421 forks source link

file name with "accents" : xl/sharedStrings.xml: error: invalid UTF-8 #145

Closed sukoi26 closed 7 years ago

sukoi26 commented 7 years ago

So i join a picture of the result on win7 x64 testname

a picture of the directory with the result COPY_ image

the source of program

sukoi26 commented 7 years ago

in zip , no chance! txt file from CB editor main - Cpp.txt

the cblock config testname - depend.txt testname - cbp.txt

tfussell commented 7 years ago

The XML in XLSX files should always be UTF-8 so I'm surprised to see this when reading an existing file. Does néàêm.xlsx contain a string with characters in the range 128-255 (i.e. Latin-1 encoding)? Did you create it with Excel? Can you send me that file or another one with the problem so I can investigate?

sukoi26 commented 7 years ago

nomame.xlsx created with excel and néàêm.xlsx is renamed copy of the file. Nothing special in the sheet image noname.xlsx

tfussell commented 7 years ago

Thanks for the example file. Unfortunately, I'm not able to reproduce this exception. I renamed the file to néàêm.xlsx, loaded it into a workbook, and saved it to another file without any problems. Could you run your program with a debugger and find out which shared string is causing this problem?

sukoi26 commented 7 years ago

the concern is on the writing of the filename in a cell

   // save name file on  cell ( 1 , 1)
    std::string ws2filename = filein.substr(filein.find_last_of("/\\") + 1);
    ws1.cell ("A1").value(ws2filename);
tfussell commented 7 years ago

I understand now. You have two options for fixing this. You can use wmain instead of main, a special Windows-only entry point that passes arguments as utf16 encoded std::wstrings. It's more correct, but requires platform-specific code. If you can ensure that your arguments will always be in the latin1 codepage (like for French), the second approach may be easier to maintain.

#include <codecvt>

std::string utf16_to_utf8(const std::wstring &utf16)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> converter;
    return converter.to_bytes(utf16);
}

int wmain(int argc, wchar_t *argv[])
{
    xlnt::workbook wb;
    auto filein = std::wstring(argv[1]);
    wb.load(filein);
    auto ws2filename = utf16_to_utf8(filein.substr(filein.find_last_of(L"/\\") + 1));
    auto ws1 = wb.active_sheet();
    ws1.cell("A1").value(ws2filename); // I could add a wstring overload here so you don't have to convert ws2filename to utf8 first
    wb.save("test.xlsx");
}

or

std::string latin1_to_utf8(const std::string &latin1)
{
    std::string utf8;

    for (auto character : latin1)
    {
        if (character >= 0)
        {
            utf8.push_back(character);
        }
        else
        {
            utf8.push_back(0xc0 | static_cast<std::uint8_t>(character) >> 6);
            utf8.push_back(0x80 | (static_cast<std::uint8_t>(character) & 0x3f));
        }
    }

    return utf8;
}

int main(int argc, char *argv[])
{
    xlnt::workbook wb;
    auto filein = latin1_to_utf8(argv[1]);
    wb.load(filein);
    auto ws2filename = filein.substr(filein.find_last_of("/\\") + 1);
    auto ws1 = wb.active_sheet();
    ws1.cell("A1").value(filein);
    wb.save("test.xlsx");
}
sukoi26 commented 7 years ago

i try the second to minimize the change for platform, but it nok, i receive d:\dvp\test>bin\Debug\testname -f néàêm.xlsx terminate called after throwing an instance of 'xlnt::exception' what(): xlnt::exception : file not found néàêm.xlsx

in this case xlnt cannot load the file, but my test on file name is ok . i use chcp to try several code page 850, 1250, 65001, any success

tfussell commented 7 years ago

Are you sure that the file exists there? Maybe try specifying the full path to the file. It's working for me with codepages 437 and 65001 in Command Prompt and in MinGW bash. For debugging, you can manually create a std::fstream from main's argv (convert it to a std::wstring using std::codecvt_utf8_utf16) and check std::fstream::good() to find out if it's working and then pass that to xlnt::workbook::load(std::istream &). I don't think this is a problem with xlnt.

sukoi26 commented 7 years ago

i change my code

    // save name file on  cell ( 1 , 1)
    auto fileinl = latin1_to_utf8(filein);
    std::string ws2filename = fileinl.substr(fileinl.find_last_of("/\\") + 1);
    ws1.cell ("A1").value(ws2filename);

and now it is ok

tfussell commented 7 years ago

Good to hear. Calling this one closed.