Closed wxtrac closed 13 years ago
decompose.diff
(1.3 KiB)It isn't obvious to me why the comment "// Decompose the string" is present as CFStringNormalize's goal with kCFStringNormalizationFormC is to produce Unicode that includes both decomposed and precomposed characters. (Forms D and KD are decomposed; forms C and KC only precompose where precomposed characters are available.)
could you please provide a test case that shows your problem, the convention (that was discussed on wx-dev quite a long time ago) is that throughout wx unicode is always composed, and only before and after native file calls decomposition / composition takes place, so wxDir should return a composed string actually
minimal.diff
(1.5 KiB)I added a patch for at test case showing the issue. I ran it on Snow Leopard to make sure that this is not only an iPhone problem.
char szFileName[] = {'T', 'e', 's', 't', 0xc3, 0x85, '.', 't', 'x', 't', 0}; // TestÅ.txt
wxString sFileName = szFileName;
// Å = 0xc3 0x85
wxFFile file(sFileName, "wb+");
file.Close();
wxDir dir(".");
dir.GetFirst(&sFileName, "Test*.txt", wxDIR_FILES);
// Å = 0x41 0xcc 0x8a
My terminology might be off, but it seems like the native format of wxWidgets is to keep the strings in the "shortest form", and that is not what is returned from wxDir.
The comment should probably be "// Decompose the string"
The original patch also removes the comment line:
"// WARNING: Are we sure that CFString's conversion will cause decomposition?"
... because the answer to this is seemingly: "No, converting to CF string will not decompose it".
Argh. The comment should be "// Decompose and compose the string" which is what Normalization-C does.
thanks, I'll compare the implementations again, to make sure we are always getting C-Normalized Strings back
I think the patch is correct (it might not be ideal because maybe we can avoid doing this for the strings not coming from the kernel but it's better to be inefficient than wrong) so I'll apply it soon if there are no objections.
(In [66033]) Ensure that strings returned by wxMBConv_cf are in NFC form.
Normalize all Unicode strings used internally even though the Darwin kernel gives them to us in decomposed (NFD) form.
Closes #11730.
Issue migrated from trac ticket # 11730
component: wxOSX | priority: normal | resolution: fixed
2010-02-15 22:26:57: pfriis (Preben Friis) created the issue
After having spent a day tracking down why some files containing "ø" and "å" did apparently change name in the file system I found the cause to be a missing decomposition.
Attached patch fixes this.
The issue was in iPhone but I guess the same goes for Mac.