Closed PaulMansour closed 6 years ago
'0123456789ABCDE'
'FGHIJKLMNOPQRST'
'UVWXYZ_abcdefgh'
'ijklmnopqrstuvw'
'xyzÀÁÂÃÄÅÆÇÈÉÊË'
'ÌÍÎÏÐÑÒÓÔÕÖØÙÚÛ'
'ÜÝßàáâãäåæçèéêë'
'ìíîïðñòóôõöøùúû'
'üþ∆⍙ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀ'
'ⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ'
are the 150 name chars below U+2710, i.e. in the first 10000 code-points, that APL allows in names.
Windows says that '/\:*?"<>|'
may not be in filenames.
I guess it'd be up to individual proprietory zipping programs to decide what to add to the list.
Acre3 and before encoded '∆⍙'
but it turned out to be vulnerable to interpreter changes.
I can probably do better now.
I'll maintain a list and an enhancement to the encode/decode capitals stuff. Then as we find others beyond those two we can just add them.
@PhilLast This can be closed I guess?
Not actually done yet.
Propose a straight swap of '∆⍙'
to '+±'
on the way to file and '+±'
to '∆⍙'
on the way back both under the auspices of the functions that currently add and remove the casecodes.
Need to be chars that are acceptable as filename elements to all and sundry and not as APL name elements.
Other "typeable"* candidates below U+100, of which we might need others if other unacceptable chars are found, are:
'!%¡£¥¦§©ª«¬®°²³µ¶·¸¹º»¼½¾¿'
from which I've removed all chars currently used by APL in any guise and the few European diacriticals that Dyalog doesn't allow in names.
OK?
Sounds good to me.
Turns out that the underscored alphabet
ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ
(that surprised me by displaying as underscored their being circled according to Unicode and in all fonts except APL385) are all banned from filenames in compressed-folders as well. I'd sooner not cater for them unless there's serious pressure to do so because I understand Dyalog deprecates them.
Do we actually need to change these function names?
Shouldn't be any need to change anything. The functions doing the encoding and decoding of capitals are now doing the switch if an item is created in the editor or by SetChanged including ∆
or ⍙
. So it's only the filename that changes. Users will have to decide between being able to compress and the absolute need to use any of ⒶⒷⒸⒹⒺⒻⒼⒽⒾⒿⓀⓁⓂⓃⓄⓅⓆⓇⓈⓉⓊⓋⓌⓍⓎⓏ
.
Just to clarify, these characters are only causing a problem in Windows built in zip support (read more here)
I wouldn't bother working around it. All modern zip tools should cope fine.
It was actually not decided whether Phil's workaround (to replace ∆
and ⍙
in folder names) should stay or be removed.
I just tested extracting with Windows at it works.
I zipped a file with bad characters using 7zip unzipped using Windows explorer without issue.
Happy to remove the encoding. No harm to leave the decoding for a while in case it's actually happened to anyone's code and he/she has files already containing "+" and/or "±" in their names.
I would recommend removing the encoding then.
Done that. No name will be changed from APL. Any filename containing "+" will be read as "∆", "±" as "⍙" for the nonce.
It appears that Windows does not like to compress folders that contain file names with APL chars like ∆ (delta). It is pretty common to use delta and delta-underbar, and I don't think it will do that we won't be able to compress a project with at least those two chars, and perhaps others. Can we encode them? What is the list of chars that are valid in an APL name, but invalid in a compressed folder?