guwidoe / VBA-StringTools

Useful methods for cross platform interaction with Unicode strings in VBA.
MIT License
17 stars 8 forks source link

Re-encode files in UTF-8 inside Git index #16

Closed DecimalTurn closed 1 year ago

DecimalTurn commented 1 year ago

Now that the change was made in the .gitattributes file, Git will force those changes to rows with non-ascii characters anytime someone clones the repo, so might as well do it now. Again, this won't affect the encoding inside the working directory, just the Git index.

Note that I've also taken the opportunity to fix mentions of "DeocdeUTF8" in the comments.

guwidoe commented 1 year ago

Thanks! I overlooked those comments when I changed the function name. By the way, I'm currently working on a big enhancement for the transcoding API wrappers, I think I'll be ready to push it sometime today. I'm curious what you find most useful about this repo and if you have ideas for enhancements/additions!

DecimalTurn commented 1 year ago

@guwidoe Awesome! I'll certainly check it out.

I've mostly used DecodeUTF8 so far. I'm using it to solve one of my biggest pet peeves with Excel: Opening a UTF-8 csv (with no BOM) in the correct encoding. Basically, once the file is incorrectly opened using the local Windows1252 encoding, I recalculate the original bytes in UTF-8 and convert it to VBA string, so I can fix the non-ascii characters. It might not be the fastest method, but I like that it doesn't involve re-opening the file.

Regarding potential enhancements, I'll have a think about it. I haven't looked carefully at all the code yet, but I recently discovered that AscW is not fully supported on Mac (see 2nd note here).

Visual Basic for the Macintosh does not support Unicode strings. Therefore, AscW (n) cannot return all Unicode characters for n values in the range of 128–65,535, as it does in the Windows environment. Instead, AscW (n) attempts a "best guess" for Unicode values n greater than 127. Therefore, you should not use AscW in the Macintosh environment.

Is that limitation already taken into account inside the AscU function? I don't have a Mac available to test this, so I wasn't sure if you already account for the weird behavior on the Mac in this case.

guwidoe commented 1 year ago

Interesting! And thanks for the information regarding AscW on Mac.

I am indeed relying on the AscW function to work correctly on Mac, and while I haven't tested it explicitly, I'm almost 100% sure that the documentation you linked is wrong in this case. VBA uses Unicode internally on Mac too (UTF-16-LE). All the AscW function has to do is read the two first bytes of the input string and interpret them as a Number, there is literally no logic involved. I have already found multiple occurrences where the documentation was inaccurate regarding Mac, so I wouldn't worry about it too much.

DecimalTurn commented 1 year ago

Yeah, inaccuracies in the docs wouldn't surprise me in that case. I know someone with a Mac, if I can test that eventually, I might go and edit the docs myself.