the-carlisle-group / Acre-Desktop

A simple Dyalog APL IDE plugin that introduces "projects" and allows you to keep your source code in Unicode text files.
MIT License
11 stars 1 forks source link

APL Chars in File Names Prevent Zipping #2

Closed PaulMansour closed 6 years ago

PaulMansour commented 6 years ago

It appears that Windows does not like to compress folders that contain file names with APL chars like ∆ (delta). It is pretty common to use delta and delta-underbar, and I don't think it will do that we won't be able to compress a project with at least those two chars, and perhaps others. Can we encode them? What is the list of chars that are valid in an APL name, but invalid in a compressed folder?

PhilLast commented 6 years ago
'0123456789ABCDE'
'FGHIJKLMNOPQRST'
'UVWXYZ_abcdefgh'
'ijklmnopqrstuvw'
'xyzÀÁÂÃÄÅÆÇÈÉÊË'
'ÌÍÎÏÐÑÒÓÔÕÖØÙÚÛ'
'ÜÝßàáâãäåæçèéêë'
'ìíîïðñòóôõöøùúû'
'üþ∆⍙ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀ'
'ⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ'

are the 150 name chars below U+2710, i.e. in the first 10000 code-points, that APL allows in names. Windows says that '/\:*?"<>|' may not be in filenames. I guess it'd be up to individual proprietory zipping programs to decide what to add to the list.

Acre3 and before encoded '∆⍙' but it turned out to be vulnerable to interpreter changes. I can probably do better now.

I'll maintain a list and an enhancement to the encode/decode capitals stuff. Then as we find others beyond those two we can just add them.

aplteam commented 6 years ago

@PhilLast This can be closed I guess?

PhilLast commented 6 years ago

Not actually done yet.

PhilLast commented 6 years ago

Propose a straight swap of '∆⍙' to '+±' on the way to file and '+±' to '∆⍙' on the way back both under the auspices of the functions that currently add and remove the casecodes.

Need to be chars that are acceptable as filename elements to all and sundry and not as APL name elements.

Other "typeable"* candidates below U+100, of which we might need others if other unacceptable chars are found, are:

'!%¡£¥¦§©ª«¬­®°²³µ¶·¸¹º»¼½¾¿'

from which I've removed all chars currently used by APL in any guise and the few European diacriticals that Dyalog doesn't allow in names.

OK?

aplteam commented 6 years ago

Sounds good to me.

PhilLast commented 6 years ago

Turns out that the underscored alphabet

ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ

(that surprised me by displaying as underscored their being circled according to Unicode and in all fonts except APL385) are all banned from filenames in compressed-folders as well. I'd sooner not cater for them unless there's serious pressure to do so because I understand Dyalog deprecates them.

PaulMansour commented 6 years ago

Do we actually need to change these function names?

PhilLast commented 6 years ago

Shouldn't be any need to change anything. The functions doing the encoding and decoding of capitals are now doing the switch if an item is created in the editor or by SetChanged including or . So it's only the filename that changes. Users will have to decide between being able to compress and the absolute need to use any of ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ.

e9gille commented 6 years ago

Just to clarify, these characters are only causing a problem in Windows built in zip support (read more here)

I wouldn't bother working around it. All modern zip tools should cope fine.

aplteam commented 6 years ago

It was actually not decided whether Phil's workaround (to replace and in folder names) should stay or be removed.

norberturkiewicz commented 6 years ago

I just tested extracting with Windows at it works.

I zipped a file with bad characters using 7zip unzipped using Windows explorer without issue.

PhilLast commented 6 years ago

Happy to remove the encoding. No harm to leave the decoding for a while in case it's actually happened to anyone's code and he/she has files already containing "+" and/or "±" in their names.

PaulMansour commented 6 years ago

I would recommend removing the encoding then.

PhilLast commented 6 years ago

Done that. No name will be changed from APL. Any filename containing "+" will be read as "∆", "±" as "⍙" for the nonce.