Open IanLilleyT opened 2 years ago
So this is not necessarily a bug because the second parameter is the capacity of the buffer, rather than the true length. Your required behaviour is probably better suited with using the following:
buf := make([]byte, 2*len(wstr))
n := utf16.decode_to_utf8(buf, wstr)
str := string(buf[:n])
I see. One could argue it's a bug that wstring_to_utf8
returns a different utf8 string than expected for a valid wstring that has interior null characters.
I guess I'll rephrase it as a feature request😄
Make wstring_to_utf8
take a slice instead of a pointer + backing buffer length
wstring_to_utf8
is only called from two places: utf16_to_utf8
and add_user_profile
. utf16_to_utf8
is effectively passing a slice already. I haven't looked too deeply into add_user_profile
but it could count the characters up until null by itself instead of having wstring_to_utf8
do it.
The reason why this and https://github.com/odin-lang/Odin/pull/1951 are useful to me is I'm using the log tracking allocator and want to get the alloc and free numbers to match up exactly where possible.
Oh wait, wstring
is a multipointer, so the function would have to be called something else... I don't know how much confusion would be caused by keeping the name and taking a []WCHAR
instead of a wstring
Or keep the function name the same but force N
to be the true length
works-as-intended?
Odin's
wstring_to_utf8
searches the string for the first occurrence of0
and chops off everything after it, which prevents certain kinds of utf8 strings from being formed. In utf8, null aka0x00
is allowed. There's also some waste from doing another loop over the entire string.https://github.com/odin-lang/Odin/blob/5a9422b6bcda8ed7fe3f0e91db916764662397e5/core/sys/windows/util.odin#L87-L92
From
WideCharToMultiByte's
perspective there's no special behavior when it sees a null character as long as you pass an explicit sizeWideCharToMultiByte
takes aLPCWSTR
LPCWSTR
(maybe null-terminated) andLPCTSTR
(definitely null-terminated) are both wstring so wstring ought to have the semantics of the more permissive of the two, i.e. allowing interior null characters.So far I've run into at least one problem with removing the null check in the code snippet above: windows
get_current_directory
allocates a buffer that includes space for a null terminator (mimicking the behavior ofGetCurrentDirectoryW
), so the slice returned bywstring_to_utf8
is going to end with a null. If you do something likefilepath.join({os.get_current_directory, "my_file.txt"})
it will give back a path with a nul between the cwd and the file name. The fix is to do a second allocation inget_current_directory
that excludes the null terminator, or do one allocation and return a slice with the null chopped off. I'm not sure how many other places will run into this problem.Test code: