wxWidgets / wxWidgets

Cross-Platform C++ GUI Library
https://www.wxwidgets.org/
6.12k stars 1.77k forks source link

wxDir returns composed unicode in OSX #11730

Closed wxtrac closed 13 years ago

wxtrac commented 14 years ago

Issue migrated from trac ticket # 11730

component: wxOSX | priority: normal | resolution: fixed

2010-02-15 22:26:57: pfriis (Preben Friis) created the issue


After having spent a day tracking down why some files containing "ø" and "å" did apparently change name in the file system I found the cause to be a missing decomposition.

Attached patch fixes this.

The issue was in iPhone but I guess the same goes for Mac.

wxtrac commented 14 years ago

2010-02-15 22:27:19: pfriis (Preben Friis) uploaded file decompose.diff (1.3 KiB)

wxtrac commented 14 years ago

2010-02-16 06:02:25: jatupper (Jeff Tupper) commented


It isn't obvious to me why the comment "// Decompose the string" is present as CFStringNormalize's goal with kCFStringNormalizationFormC is to produce Unicode that includes both decomposed and precomposed characters. (Forms D and KD are decomposed; forms C and KC only precompose where precomposed characters are available.)

wxtrac commented 14 years ago

2010-02-16 08:06:40: @csomor changed status from new to infoneeded_new

2010-02-16 08:06:40: @csomor commented

could you please provide a test case that shows your problem, the convention (that was discussed on wx-dev quite a long time ago) is that throughout wx unicode is always composed, and only before and after native file calls decomposition / composition takes place, so wxDir should return a composed string actually

wxtrac commented 14 years ago

2010-02-16 09:40:04: pfriis (Preben Friis) uploaded file minimal.diff (1.5 KiB)

wxtrac commented 14 years ago

2010-02-16 09:50:58: pfriis (Preben Friis) changed status from infoneeded_new to new

2010-02-16 09:50:58: pfriis (Preben Friis) commented

I added a patch for at test case showing the issue. I ran it on Snow Leopard to make sure that this is not only an iPhone problem.

char szFileName[] = {'T', 'e', 's', 't', 0xc3, 0x85, '.', 't', 'x', 't', 0}; // TestÅ.txt
wxString sFileName = szFileName;

// Å = 0xc3 0x85

wxFFile file(sFileName, "wb+");
file.Close();

wxDir dir(".");
dir.GetFirst(&sFileName, "Test*.txt", wxDIR_FILES);

// Å = 0x41 0xcc 0x8a

My terminology might be off, but it seems like the native format of wxWidgets is to keep the strings in the "shortest form", and that is not what is returned from wxDir.

The comment should probably be "// Decompose the string"

The original patch also removes the comment line:

"// WARNING: Are we sure that CFString's conversion will cause decomposition?"

... because the answer to this is seemingly: "No, converting to CF string will not decompose it".

wxtrac commented 14 years ago

2010-02-16 10:35:51: pfriis (Preben Friis) commented


Argh. The comment should be "// Decompose and compose the string" which is what Normalization-C does.

wxtrac commented 14 years ago

2010-02-16 15:46:09: @csomor changed status from new to accepted

2010-02-16 15:46:09: @csomor commented

thanks, I'll compare the implementations again, to make sure we are always getting C-Normalized Strings back

wxtrac commented 13 years ago

2010-11-05 15:39:14: @vadz commented


I think the patch is correct (it might not be ideal because maybe we can avoid doing this for the strings not coming from the kernel but it's better to be inefficient than wrong) so I'll apply it soon if there are no objections.

wxtrac commented 13 years ago

2010-11-05 22:40:10: @vadz changed status from accepted to closed

2010-11-05 22:40:10: @vadz set resolution to fixed

2010-11-05 22:40:10: @vadz commented

(In [66033]) Ensure that strings returned by wxMBConv_cf are in NFC form.

Normalize all Unicode strings used internally even though the Darwin kernel gives them to us in decomposed (NFD) form.

Closes #11730.