microsoft / cppwinrt

C++/WinRT
MIT License
1.64k stars 238 forks source link

Bug: C++WinRT does not correctly handles Chinese characters using `Windows.Foundation.Uri` #1424

Closed HO-COOH closed 2 months ago

HO-COOH commented 2 months ago

Version

2.0.240405.15

Summary

I have a uri with Chinese character and query string that needs to be parsed with Windows.Foundation.Uri. Using C++WinRT, it always returns me an empty result. But equivalent C# code does returns me expected result.

I have tried changing the file encoding to either UTF-8 and UTF-8 BOM, nothing worked.

Reproducible example

With C++WinRT, create a C++WinRT console application:

    winrt::Windows::Foundation::Uri uri{ LR"(myapp://open?file="C:/我.txt")" };
    winrt::Windows::Foundation::WwwFormUrlDecoder parsed{ uri.Query() };
    for (auto entry : parsed)
    {
        std::wcout << entry.Name().data() << L'\t' << entry.Value().data() << L'\n';
    }

With C#, create a UWP project:

            var uri = new Uri("myapp://open?file=\"C:/我.txt\"");
            var parsed = new WwwFormUrlDecoder(uri.Query);
            foreach (var entry in parsed)
            {
                Debug.WriteLine(entry);
            }

Expected behavior

C# result is expected: image

Actual behavior

C++WinRT result: image

Additional comments

Repro here

sylveon commented 2 months ago

Are you sure the bug is not with the console output? Check the debugger maybe?

HO-COOH commented 2 months ago

Are you sure the bug is not with the console output? Check the debugger maybe?

The break point inside the for loop is not hit. That should be clear enough.

kennykerr commented 2 months ago

For API questions I suggest: https://docs.microsoft.com/en-us/answers/topics/windows-api.html

HO-COOH commented 2 months ago

@kennykerr I don't know, but both examples uses the same Windows.Foundation.WwwFormUrlDecoder class which makes me wondering. I will ask there too.

DefaultRyan commented 2 months ago

It uses the same Windows.Foundation.WwwFormUrlDecoder class, but System.Uri is not the same as Windows.Foundation.Uri. The doc page for Windows.Foundation.Uri calls out some of these potential differences in the Remarks https://learn.microsoft.com/en-us/uwp/api/windows.foundation.uri?view=winrt-26100#remarks, including potentially relevant statements about percent-encoding non-ASCII characters.

That's as far as I got, but I'm confident that this is where you're seeing behavior differences.