idleberg / vscode-applescript

Language syntax and snippets for AppleScript
https://marketplace.visualstudio.com/items?itemName=idleberg.applescript
MIT License
72 stars 8 forks source link

Change default encoding to MacRoman #13

Closed szhu closed 6 years ago

szhu commented 6 years ago

Script Editor saves .applescript text scripts in the MacRoman encoding. It's similar enough to some other encodings that it can't be auto-detected from the content itself. Example:

  1. In VSCode settings, turn on files.autoGuessEncoding.
  2. Save the script x ≥ 1 as MacRoman. \ (You can either use Script Editor or VSCode.) \ Close the file after saving.
  3. Open the file in VSCode. It thinks the content is "Windows 1252"-encoded, and it thinks the content is x ³ 1.

Adding this in my VSCode settings informs VSCode of the proper encoding for AppleScript files:

    "[applescript]": {
        "files.encoding": "macroman",
    },

It would be nice if this package did this automatically!

Questions:

idleberg commented 6 years ago

Questions:

  • Would it be possible to add a "Save AppleScript files as MacRoman" setting?
  • Would it be a good idea for this setting to be enabled by default?

If MacRoman (still) is the default encoding, I'd rather go with the second option. However, I'm a bit hesitant since I wonder how this treats non-latin characters.

From the Mac OS Roman page on Wikipedia:

With the release of Mac OS X, Mac OS Roman and all other "scripts" (as the Mac OS called them) were replaced by UTF-8 as the standard character encoding for the Macintosh operating system

In general it's possible to define a default encoding for a language and I've added this to package.json for testing purposes. I'll play around some and see whether it makes sense to keep this setting. Your thoughts on this are welcome!

idleberg commented 6 years ago

I've decided to keep the default encoding, since its easy to change in the settings. Mac Roman is now the default encoding in v0.14.2!

szhu commented 6 years ago

Thanks for researching and addressing this so quickly!

szhu commented 6 years ago

I also wanted to take some time to talk about this:

From the Mac OS Roman page on Wikipedia:

With the release of Mac OS X, Mac OS Roman and all other "scripts" (as the Mac OS called them) were replaced by UTF-8 as the standard character encoding for the Macintosh operating system

(First, a side note– the quoted "script" above means "encoding", not "programming language", so it's not talking about AppleScript specifically.)

AppleScript is fairly anachronistic compared to the rest of macOS. The language syntax and the use of the scpt save format as (which is similar to the Python and Java's compiled .pyc/.pyo/class formats) as the default source code format seem fairly out of place in today's ecosystem of programming languages. Here are some other ways AppleScript hasn't really been updated since Mac OS 9:

Given all of this, I'm only mildly surprised that Apple didn't update AppleScript's default text encoding, either.

nicolinuxfr commented 6 years ago

In general, I like that VSCode and Script Editor use the same encoding, it will be much more simple on a daily basis to work with the files.

But, the new version of the extension tries to open UTF8 files as MacRoman without changing the encoding. And so, I have issues with accentuated characters :

capture d ecran 2018-10-01 a 08 13 16

If I reopen the file with UTF8, it works fine. I don't know if VSCode could properly convert the files ?

nicolinuxfr commented 6 years ago

One more issue : you can't use emojis with MacRoman encoding. Well, I guess you can using the unicode equivalent, but not with the emoji itself.

And I just tried, if you use an emoji inside the Script Editor and save as text, it uses UTF-16 encoding.

idleberg commented 6 years ago

I did some tests myself with a simple AppleScript file, which contains non-ASCII characters:

display dialog "äöüßéè€"

With the files.autoGuessEncoding setting active, Code will open the file as ISO 8859-2. I then did some further tests to determine the encoding:

# MacRoman
$ file -I macroman.applescript
macroman.applescript: text/plain; charset=unknown-8bit

$ xattr -l macroman.applescript
com.apple.FinderInfo:
00000000  54 45 58 54 54 6F 79 53 00 00 00 00 00 00 00 00  |TEXTToyS........|
00000010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |................|
00000020
com.apple.TextEncoding: macintosh;0
com.apple.lastuseddate#PS:
00000000  F0 CB B1 5B 00 00 00 00 20 94 64 3A 00 00 00 00  |...[.... .d:....|
00000010
com.apple.metadata:_kMDItemUserTags:
00000000  62 70 6C 69 73 74 30 30 A0 08 00 00 00 00 00 00  |bplist00........|
00000010  01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00 00 09                    |..........|
0000002a
com.apple.metadata:kMDLabel_nzfct3nddxl2ablrfgw6suoak4:
00000000  F2 E0 5F 6D 26 33 ED 13 52 56 42 EE EC 33 B8 94  |.._m&3..RVB..3..|
00000010  F1 98 3E AA 33 79 03 F2 99 74 4C D2 65 DF 75 DD  |..>.3y...tL.e.u.|
00000020  0B 13 F6 EA 11 50 09 76 ED E4 0D 2F 5B 7D F7 58  |.....P.v.../[}.X|
00000030  A7 FF D7 05 2F 34 E5 43 E9 41 32 5B EB A3 03 61  |..../4.C.A2[...a|
00000040  2D 82 95 14 BB 08 C9 2B 05 6A 5B 70 C8 A7 F8 84  |-......+.j[p....|
00000050  8E BE 43 B8 AD 9B 16 B6 BA                       |..C......|
00000059

# UTF-8
$ file -I utf8.applescript
utf8.applescript: text/plain; charset=utf-8

$ xattr -l utf8.applescript
com.apple.metadata:_kMDItemUserTags:
00000000  62 70 6C 69 73 74 30 30 A0 08 00 00 00 00 00 00  |bplist00........|
00000010  01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00 00 09                    |..........|
0000002a

This did not really help, so I've created a second AppleScript file with only ASCII characters for comparison.

display dialog "abc"

This file is indeed encoded differently:

$ file -I ascii.applescript
ascii.applescript: text/plain; charset=us-ascii

$ xattr -l ascii.applescript
com.apple.FinderInfo:
00000000  54 45 58 54 54 6F 79 53 00 00 00 00 00 00 00 00  |TEXTToyS........|
00000010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |................|
00000020
com.apple.TextEncoding: us-ascii;1536
com.apple.lastuseddate#PS:
00000000  F2 CD B1 5B 00 00 00 00 FC 0B 4B 24 00 00 00 00  |...[......K$....|
00000010
com.apple.metadata:_kMDItemUserTags:
00000000  62 70 6C 69 73 74 30 30 A0 08 00 00 00 00 00 00  |bplist00........|
00000010  01 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00 00 09                    |..........|
0000002a
com.apple.metadata:kMDLabel_nzfct3nddxl2ablrfgw6suoak4:
00000000  F2 46 E1 33 A8 F7 49 0F 64 AC 59 96 5B 72 A4 7D  |.F.3..I.d.Y.[r.}|
00000010  62 5B F9 1F 76 EE E5 EE AF 56 AC 58 2D 52 33 A7  |b[..v....V.X-R3.|
00000020  00 E0 5D E4 6F B6 08 9B 37 9A D6 04 3B E5 7B 80  |..].o...7...;.{.|
00000030  0E C4 28 6B C2 E3 8D C1 3E 67 E9 FD 7B 1A 37 44  |..(k....>g..{.7D|
00000040  16 4C 37 82 4F C9 BE 9D 07 24 C9 CB 54 CF 21 B3  |.L7.O....$..T.!.|
00000050  D7 70 5E 4A 7D 48 3D 53 05                       |.p^J}H=S.|
00000059

Code will open this file as MacRoman.

As far as I know, the Code extension API is too limited to change the encoding depending on the contents of a script. The way to restore the old behavior is described in the README.

szhu commented 6 years ago

And I just tried, if you use an emoji inside the Script Editor and save as text, it uses UTF-16 encoding.

@nicolinuxfr I wonder if it's possible to have this extension try opening files as UTF-16, using MacRoman if that fails, and try saving files as MacRoman, using UTF-16 if that fails. Then that would mirror the Script Editor behavior you describe above.

idleberg commented 6 years ago

@nicolinuxfr Can you be more specific about the encoding? Is it UTF-16 LE or BE?

nicolinuxfr commented 6 years ago

I think UTF-16 LE, this is what BBEdit says :

capture d ecran 2018-10-02 a 10 01 43

If it can help, here's a really small script containing an emoji and saved with Script Editor as text : emojiscript.zip