gulrak / filesystem

An implementation of C++17 std::filesystem for C++11 /C++14/C++17/C++20 on Windows, macOS, Linux and FreeBSD.
MIT License
1.34k stars 173 forks source link

path::string() Behavior on Win64 Inconsistent with std::filesystem #181

Closed Sallee1 closed 5 months ago

Sallee1 commented 5 months ago

Describe the bug

Specifically, when handling non-English paths, std::filesystem::path::string() correctly returns a string in the native encoding. In contrast, ghc::filesystem::path::string() returns a UTF-8 string.

To Reproduce

#include <string>
#include <iostream>
#include <filesystem>
#include <ghc/filesystem.hpp>

namespace stdfs = std::filesystem;
namespace ghcfs = ghc::filesystem;

int main()
{
    // Prepare paths
    std::wstring relative_path = L"./这/是/相对路径";   // this/is/relative
    std::wstring root_path = L"C:/这/是/根路径";        // C:/this/is/root        

    // Concatenate paths
    auto std_path = stdfs::path(root_path) / relative_path;
    auto ghc_path = ghcfs::path(root_path) / relative_path;

    // Convert to absolute paths
    auto std_absolute_path = stdfs::absolute(std_path);
    auto ghc_absolute_path = ghcfs::absolute(ghc_path);

    // Convert to strings
    // Native encoding (GBK), interpreted as "C:/这/是/根路径/这/是/相对路径" in GBK encoding
    std::string std_absolute_str = std_absolute_path.string();
    // UTF-8 encoding, interpreted as "C:\杩橽鏄痋鏍硅矾寰刓杩橽鏄痋鐩稿璺緞" in GBK encoding
    std::string ghc_absolute_str = ghc_absolute_path.string(); 
}

Expected behavior

ghc::filesystem::path::string() should behave consistently with std::filesystem::path::string() and return a string in the native encoding.

Additional context

Is this issue a bug or an intentional design choice?

As this library is used in business code, please provide a temporary solution to address this issue.

Sallee1 commented 5 months ago

path::string() and path::u8string() implement exactly the same code, which I think is probably a mistake

gulrak commented 5 months ago

I'm sorry, that it doesn't work the way you expected it to work.

It is documented in Differences in API in the README.md though, and there is an additional "Important" paragraph in the platforms section of the same file that states this. Maybe it's not explicit enough, but the std::string interpretation is meant for both directions of the API, each std::string going in or out is UTF-8 and/or will be handled as such.

It is an explicit design choice following the "UTF-8 Everywhere" philosophy. I know this is not fitting everyone's needs, but was one way to limit the complexity for a single-developer project. There are currently no plans to change this implementation aspect of ghc::filesystem.

PS: Sadly I don't have an easy temporary solution for this problem, as it seams the surrounding code is expecting more a more locale adapting solution. This library is only more or less a drop-in replacement if UTF-8 (or on Windows alternatively std::wstring) based Unicode is used.