nlohmann / json

JSON for Modern C++
MIT License
41.3k stars 6.58k forks source link

to_json(std::filesystem::path) can create invalid UTF-8 chars on windows #4271

Open MHebes opened 5 months ago

MHebes commented 5 months ago


This conversion function:

template<typename BasicJsonType>
inline void to_json(BasicJsonType& j, const std_fs::path& p)
    j = p.string();

uses p.string(), which does not give a UTF-8-encoded string on windows (in some cases, maybe?). Trying to dump() the resultant JSON throws a "invalid UTF-8 byte" exception.

Reproduction steps

Convert a std::filesystem::path, which contains a unicode "Right Single Quotation Mark" character (U+2019), to a json implicitly or with to_json.

Inspect the new json (string_t)'s bytes, either by dump()ing, or converting to BSON.

Expected vs. actual results

Expected: "Strings are stored in UTF-8 encoding." per

Actual: The string gets converted by std::filesystem::path::string(), which appears to convert it to Windows-1252 encoding. Its bytes end up as \x92 rather than \xe2\x80\x99.

Minimal code example

#include <filesystem>
#include <iostream>
#include <nlohmann/json.hpp>

int main() {
  try {
    wchar_t wide_unicode_right_quote[2] = {0x2019, 0};  // came from a directory_iterator in reality
    nlohmann::json apost = std::filesystem::path(wide_unicode_right_quote);
    std::cout << apost << std::endl;
    return 0;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return 1;

Workaround I'm using is to use WideCharToMultiByte + .native() to get the string in UTF-8 before passing to nlohmann:

inline std::string Narrow(std::wstring_view wstr) {
  if (wstr.empty()) return {};
  int len = ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), nullptr, 0, nullptr, nullptr);
  std::string out(len, 0);
  ::WideCharToMultiByte(CP_UTF8, 0, &wstr[0], wstr.size(), &out[0], len, nullptr, nullptr);
  return out;

int main() {
  try {
    wchar_t wide_unicode_right_quote[2] = {0x2019, 0};  // came from a directory_iterator in reality
    nlohmann::json apost = Narrow(std::filesystem::path(wide_unicode_right_quote).native());
    std::cout << apost << std::endl;
    return 0;
  } catch (const std::exception& e) {
    std::cerr << e.what() << std::endl;
    return 1;

Error messages

"[json.exception.type_error.316] invalid UTF-8 byte at index 0: 0x92

Compiler and operating system

MSVC 2022 Professional, C++ 20

Library version

develop - a259ecc


MHebes commented 4 months ago

I can also workaround this problem by adding a manifest XML that sets my app's code page to CP_UTF8 on supported versions of windows.

In CMake I wrapped this in a function:

# target_add_manifest(<target> <manifest file>)
# You probably want to use ${MANIFEST_FILE_UTF8} defined below this function
# Adds a manifest file (
# to an EXE
function(target_add_manifest TARGET_NAME MANIFEST_FILE)
      message(FATAL_ERROR "You must provide a target")
      message(FATAL_ERROR "You must provide a manifest file")
        COMMAND "mt.exe" -manifest \"${MANIFEST_FILE}\" \"-updateresource:$<TARGET_FILE:${TARGET_NAME}>\"

which is used like this (probably want to wrap in a platform check):

add_executable(myapp main.cpp)
target_add_manifest(myapp "${CMAKE_CURRENT_SOURCE_DIR}/cmake/utf8.manifest")

with utf8.manifest being:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
      <activeCodePage xmlns="">UTF-8</activeCodePage>

This solves the problem, if the app is running on at least Windows Version 1903. Still a bug but wanted to share this workaround because it's useful for many libraries that have the same issue.

MHebes commented 4 months ago

Proposed diff to do the conversion to UTF-8 when targeting windows:

diff --git a/include/nlohmann/detail/conversions/to_json.hpp b/include/nlohmann/detail/conversions/to_json.hpp
index 562089c3..a8b74688 100644
--- a/include/nlohmann/detail/conversions/to_json.hpp
+++ b/include/nlohmann/detail/conversions/to_json.hpp
@@ -413,10 +413,20 @@ inline void to_json(BasicJsonType& j, const T& t)

+#if defined(_WIN32)
+#include <windows.h>
 template<typename BasicJsonType>
 inline void to_json(BasicJsonType& j, const std_fs::path& p)
+#if defined(_WIN32)
+    int len = ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), nullptr, 0, nullptr, nullptr);
+    std::string as_utf8(len, 0);
+    ::WideCharToMultiByte(CP_UTF8, 0, &p.native()[0], p.native().size(), &narrowed_string[0], len, nullptr, nullptr);
+    j = std::move(as_utf8);
     j = p.string();
zel1b08a commented 1 month ago

path may be represented in some ways (native/generic_string/string/u8string/e.t.c), so, I think it should be decided on client side how to store it before put it to json object.

Just j = p.u8string(); may be should be there to corresponds the docs.