iains / gcc-darwin-arm64

GCC master branch for Darwin with experimental support for Arm64. Currently GCC-15.0.0 [September 2024]
GNU General Public License v2.0
268 stars 33 forks source link

File names case-sensitive (Ada-related) #101

Closed simonjwright closed 7 months ago

simonjwright commented 1 year ago

At gcc/ada/adaint.c:611 we have

      /* By default, we suppose filesystems aren't case sensitive on
         Windows and Darwin (but they are on arm-darwin).  */
#if defined (WINNT) || defined (__DJGPP__) \
  || (defined (__APPLE__) && !(defined (__arm__) || defined (__arm64__)))
      file_names_case_sensitive_cache = 0;
#else
      file_names_case_sensitive_cache = 1;
#endif

The change (b54d1d39d30c of 2015-10-20) was made because "file names are case sensitive on aarch64-ios".

But they aren’t on macOS (unless a user has made the unwise decision to change the default for their disk).

I can’t see any defines which would allow macOS to be detected vs iOS (or *OS, I suppose), except perhaps __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ which doesn’t look reliable.

_The environment variable GNAT_FILE_NAME_CASE_SENSITIVE, see line 603, is for use in a built compiler, not while building the compiler._

iains commented 1 year ago

Typically, my build and sources disks are case-sensitive and I set GNAT_FILE_NAME_CASE_SENSITIVE in my .profile and I've not seen issues in building the compiler on either x86_64 or aarch64 macOS (once in a while I have built on a non CS disk again without issue).

Actually, since Catalina or so I do not see (the previously unwise) issue with making the OS disk CS either - since the OS is stored on a non-CS-read-only partition so that it is only the User's dynamic additions that are on a CS writable partition (although this is all 'under the hood' from the perspective of the end user).

There are defines that allow detection of macOS c.f. iOS / watchOS / tvOS .. PLATFORM_xx or something similar (from memory). grep should find them in the SDK.

simonjwright commented 1 year ago

I had no trouble building the compiler and tools on aarch64-apple-darwin, but then none of the sources use upper-case (or international characters) in filenames so I wouldn’t expect any.

I found this because one of my users, against advice, or perhaps in ignorance of it, used mixed case for file names, and the compiler (or possibly the build tool, gprbuild) couldn’t find the file it needed.

I thought the problem was that -- if say /Applications is on a CS filesystem -- an applications’ code may assume non-CS and fail when it encounters CS?

I couldn’t see the SDK defines you mention, so my thoughts on how to progress this are

iains commented 1 year ago

I had no trouble building the compiler and tools on aarch64-apple-darwin, but then none of the sources use upper-case (or international characters) in filenames so I wouldn’t expect any.

FWIW ( not a justification more an observation ) I started to build all my OSS projects on CS systems because some did assume that they were on CS - not specifically GCC. I have around 120 OSS projects on my sources and build some/all of them with GCC at times.

edit : although the sources and builds are on CS - until Catalina, I did keep my OS disk as case-preserving / non-CS.

I found this because one of my users, against advice, or perhaps in ignorance of it, used mixed case for file names, and the compiler (or possibly the build tool, gprbuild) couldn’t find the file it needed.

Ah.. of course, we cannot speak for tools outside of the builds that we do, but a failure to handle CS/non-CS is a bug, I'd say (CS is usually handled as the default for OSS projects, at least).

I thought the problem was that -- if say /Applications is on a CS filesystem -- an applications’ code may assume non-CS and fail when it encounters CS?

There were notable huge vendors who released applications that were not safe on CS OS X systems, indeed - but that was 15 years ago .. I'd hope they have the idea by now.

However, I think you are technically correct - post-delivery installations into / will be directed to the CS writeable partition.

FWIW, at this time, I have not seen problems (I've been using CS system since hmm... Catalina, I think) but then I do not install many 3rd-party applications.

I couldn’t see the SDK defines you mention, nevertheless, the information is present - see for example MacOSX12.sdk/usr/include//AvailabilityInternal.h

#define __API_AVAILABLE_PLATFORM_macos(x) macos,introduced=x
#define __API_AVAILABLE_PLATFORM_macosx(x) macosx,introduced=x
#define __API_AVAILABLE_PLATFORM_ios(x) ios,introduced=x
#define __API_AVAILABLE_PLATFORM_watchos(x) watchos,introduced=x

So figuring out how these get their information should work?

NOTE : GCC at this stage does NOT support iOS/tvOS (and has not got the 32b port to support watchOS) so that you could currently assume macOS.

so my thoughts on how to progress this

  • issue warnings in release notes to my users,
  • (as a last resort) patch out the problematic code in adaint.c.

unless this appears as a prevalent issue, I'd suggest the former .. for most people thing "just work".

iains commented 1 year ago

looking at clang's output, I'd say you can also use:

__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__
__ENVIRONMENT_IPHONE_OS_VERSION_MIN_REQUIRED__
__ENVIRONMENT_TV_OS_VERSION_MIN_REQUIRED__

etc.

(edit : GCC will only emit ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED thus far)

fxcoudert commented 1 year ago

I would report this to the Ada maintainers: whether the system is case-sensitive is not a property of the OS at all, it will depend on the filesystem. That code makes no sense.

iains commented 1 year ago

I guess the only 'tactical' difference is, that with iOS, the end user has no influence over the FS type (so if you know the target is iOS you can decide), but with macOS this no longer holds.

As of now, GCC (as supported here, and upstream) does not support iOS - therefore we have to assume that the user could change the FS kind (I certainly do). The Ada components honour GNAT_FILE_NAME_CASE_SENSITIVE, so AFAICT that is usable. Can we close this issue, since it appears to be either something that should be raised upstream (it would manifest on x86_64 I presume)?

cooljeanius commented 10 months ago

Related to upstream bug 81114 I think? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114

iains commented 10 months ago

perhaps, but the phenomenon we are discussing here is not with older OS revisions - but actually with the "latest and greatest" .. my speculation is that HFS+ does some translation between the call and the disk - but APFS does not (and so if the supplied string is incompatible, it just punts instead of converting).

I'd say that this is a job for the runtime librar(ies) to sort out.

simonjwright commented 10 months ago

I would report this to the Ada maintainers: whether the system is case-sensitive is not a property of the OS at all, it will depend on the filesystem. That code makes no sense.

I think all the Ada maintainers are AdaCore people; AdaCore have said they don’t support macOS.

I’ve written a small demonstrator (attached) that could be included in adaint.c, and invoked if GNAT_FILE_NAME_CASE_SENSITIVE isn’t set. Tested on Catalina (the earliest version I have), Monterey (x86_64), Sonoma (aarch64), against ExFAT, HFS, HFS CS, APFS, APFS CS (the HFS and both CS tests were against disk images).

Issues I see:

check_case_sensitivity.zip

iains commented 10 months ago

I would report this to the Ada maintainers: whether the system is case-sensitive is not a property of the OS at all, it will depend on the filesystem. That code makes no sense.

I think all the Ada maintainers are AdaCore people; AdaCore have said they don’t support macOS.

Well, they no longer actively develop for macOS, but I think that they will review macOS / Darwin patches for Ada (i.e. I do not think that I can approve them as Darwin maintainer - although perhaps I should clarify that with them).

I’ve written a small demonstrator (attached) that could be included in adaint.c, and invoked if GNAT_FILE_NAME_CASE_SENSITIVE isn’t set. Tested on Catalina (the earliest version I have), Monterey (x86_64), Sonoma (aarch64), against ExFAT, HFS, HFS CS, APFS, APFS CS (the HFS and both CS tests were against disk images).

Issues I see:

  • getattrlist() first appeared in Yosemite (10.10, darwin14). Could check for OS release and default to case-insensitive if < darwin14.

We can use min version in adainit.c - it's already done.

  • adaint.c has a lot of OS sensitivity, would make it tricky to review and very difficult to test

Indeed it is the "kitchen sink" for interface stuff.

check_case_sensitivity.zip

I will try to take a look at this sometime after Stage1 is finished (19th Nov) - I think the bug is a regression and therefore can be fixed in Stage 3. Until then, I am snowed under with Stage1 stuff :)

iains commented 10 months ago

As an aside: "// getattrlist() first appeared in Yosemite, 10.10 (darwin14). There // are notes in the man page under Compatbility:"

This does not seem correct, I see getattlist() in /usr/include/sys/attr.h back to Darwin8 (MacOSX 10.4) [ not that this matters too much to the actual issue, since APFS is not available until 10.13 or so]

simonjwright commented 10 months ago

As an aside: "// getattrlist() first appeared in Yosemite, 10.10 (darwin14). There // are notes in the man page under Compatbility:"

This does not seem correct, I see getattlist() in /usr/include/sys/attr.h back to Darwin8 (MacOSX 10.4) [ not that this matters too much to the actual issue, since APFS is not available until 10.13 or so]

I saw it in a Compatibility man page section somewhere on the web, sorry. Looking at current unistd.h, it seems to be saying that 10.6 is the start point. There were some changes in attr.h between Catalina and Sonoma, the one that affected me was that ATTRIBUTE_SET_INIT wasn’t in the earlier version.

I don’t believe that APFS affects the issue ("File names case sensitive"), this arises on HFS just as much as on APFS. The difference with APFS is that it will fail any attempt to create a file with an upper latin-1 character in its name, e.g. lower-case-a-acute, 0xe1; whereas HFS converts the character to the three-character string %E1.

If you try to run the 2.6 ACATS, as included in GCC (e.g. make -C gcc check-acats or check-ada) on an APFS volume, test C250002 will fail because it involves files with names containing latin-1 characters; whereas on an HFS volume, the test succeeds because the conversions noted above work for write and read.

If you run the current (4.1) ACATS on a case-insensitive volume, C250002 will fail because it involves files with names containing UTF-8 characters that GCC tries to convert to lower-case by treating each octet as if it were latin-1 (if it looks like an upper-case latin-1 character, set bit 5!). On a CS volume, GCC leaves well alone and all works.

Why does GNAT do this? because with Foo; translates to a dependence on file foo.ads (as does with foo;). This is all quite deep in the compiler, and I’m not at all sure it’s a library issue. All the same, recognising case-sensitivity properly can only be a good thing.

iains commented 10 months ago

Making the process automatic (rather than relying on GNAT_FILE_NAME_CASE_SENSITIVE) seems like a good objective.

However, it seems that GNAT_FILE_NAME_CASE_SENSITIVE is not sufficient on APFS (C250002 fails on a case-sensitive APFS volume even when that env is set), where it does work reliably on HFS+ (as you say some conversion is done).

I'll try to give this some better attention after stage 1 is done.

simonjwright commented 10 months ago

However, it seems that GNAT_FILE_NAME_CASE_SENSITIVE is not sufficient on APFS (C250002 fails on a case-sensitive APFS volume even when that env is set), where it does work reliably on HFS+ (as you say some conversion is done).

The attached is a C program that demonstrates this behaviour. On an HFS disk,

$ latin
attempting to open 'l?tin.txt':
ok
$ ls
l%E1tin.txt

On an HFS CS disk,

$ latin
attempting to open 'l?tin.txt':
ok
$ ls
l%E1tin.txt

On an APFS disk,

$ latin
attempting to open 'l?tin.txt':
call failed: Illegal byte sequence

On an APFS CS disk,

$ latin
attempting to open 'l?tin.txt':
call failed: Illegal byte sequence

I’ll raise a Bugzilla PR on the ACATS testsuite, as I should have done already: likewise on this issue.

latin.zip

cooljeanius commented 10 months ago

I’ll raise a Bugzilla PR on the ACATS testsuite, as I should have done already: likewise on this issue.

I'm assuming that that's this? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111909

simonjwright commented 10 months ago

I'm assuming that that's this? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111909

Yes (this issue).

simonjwright commented 10 months ago

See this patch. It’ll take a while for AdaCore to comment one way or another; of course it’s very Darwin-specific. I don’t have a Linux box to test it on.

iains commented 10 months ago

See this patch. It’ll take a while for AdaCore to comment one way or another; of course it’s very Darwin-specific. I don’t have a Linux box to test it on.

I also want to check it on earlier Darwin - just in case (and then will comment on list)...

iains commented 10 months ago

FAOD, with this patch, I should no longer need to set GNAT_FILE_NAME_CASE_SENSITIVE (when using a CS filesystem)? If all goes smoothly, can also test on Linux.

iains commented 10 months ago

OK.. so initial testing on Darwin19 (OK) Darwin17 (not OK) and Darwin9 (not OK)

According to the manual page for getattrlist and the test for a case-sensitive volume should exist on Darwin9 and 17, so there is some debugging needed.

edit: this might relate to file systems mounted onto paths in the root dir (which seems to report correctly on later OS versions but with 'invalid parameter' on earlier). Not sure when I'll be able to debug further.

cooljeanius commented 9 months ago

See this patch. It’ll take a while for AdaCore to comment one way or another; of course it’s very Darwin-specific. I don’t have a Linux box to test it on.

Following the thread up across the month-boundary that exists in the mailing list archives: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635104.html

simonjwright commented 9 months ago

On 22 Oct 2023, at 02:22, Eric Gallager @.***> wrote:

I’ll raise a Bugzilla PR on the ACATS testsuite, as I should have done already: likewise on this issue.

I'm assuming that that's this? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111909

Yes, that’s this issue.

The one on ACATS (in GCC) is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112529 - apparently it affects Windows as well.

simonjwright commented 9 months ago

By the time we’d got to the last message in the patch thread, https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637004.html, I’d rather lost interest (in fact, I wrote a reply which I seem not to have posted saying that I thought it was really up to Darwin people as to whether Arno’s proposal - disregarding the possibility of iWatch, for instance - was OK).

If we go ahead with this minimal patch, macOS computers will be consistent, which is a good thing. It might make it more difficult to get the original patch in (fixed up for the older OS issues that Iain found, of course) at a later date.

On further thought - Iain, have you actually encountered Ada case-related issues? I’ve not had any reports of aarch64-apple-darwin users having trouble in this area, in spite of the compiler thinking the FS is case-sensitive. This means that ACATS 4.1’s C250002 actually works on aarch64! (PR81114 refers).

iains commented 9 months ago

By the time we’d got to the last message in the patch thread, https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637004.html, I’d rather lost interest (in fact, I wrote a reply which I seem not to have posted saying that I thought it was really up to Darwin people as to whether Arno’s proposal - disregarding the possibility of iWatch, for instance - was OK).

We are a long long way from having time to extend the Arm64 port to cover 32b (not arguing against the idea, but being realistic in resourcing)

If we go ahead with this minimal patch, macOS computers will be consistent, which is a good thing. It might make it more difficult to get the original patch in (fixed up for the older OS issues that Iain found, of course) at a later date.

I will reply to the list thread, IMO it's a step forward; I'd prefer an automatic solution, but there's more work to do to achieve that and I do not have cycles spare to explore what's going wrong (the calls are a thin wrapper on a syscall, so the underlying difference is how the kernel is responding to the input path). I am not concerned that a short-term improvement would alter the acceptability of a better patch when we have one.

On further thought - Iain, have you actually encountered Ada case-related issues? I’ve not had any reports of aarch64-apple-darwin users having trouble in this area, in spite of the compiler thinking the FS is case-sensitive.

I have not seen reports of problems, there are none (other than the current discussed cases) in BZ AFAIK. I use the GNAT_FILE_NAME_CASE_SENSITIVE=1 for local testing (and, yes, it's irritating to repeat the testsuite run if I forget to do it - but not a show-stopper).

This means that ACATS 4.1’s C250002 actually works on aarch64! (PR81114 refers).

What has changed between the version that we have in GCC and 4.1?

simonjwright commented 9 months ago

This means that ACATS 4.1’s C250002 actually works on aarch64! (PR81114 refers).

What has changed between the version that we have in GCC and 4.1?

The test involves a file whose name (c250002_á.ads) includes an international character; the text of that file includes its own name (C250002_Á), upper-cased as you see.

The in-tree test implements this using Latin-1 characters; ACATS 4.1 uses UTF-8.

I’ve only investigated the UTF-8 implementation. The compiler realises that it needs to lower-case the text name (because Ada), and manages this successfully.

If file names are case-insensitive, a later part of the compiler (for reasons) decides to lower-case this already-lower-case name by treating each byte as if it were Latin-1, which mangles the á (0xc3a1) into 0xe3a1, which isn’t a valid character.

I don’t know why APFS allows this, when it doesn’t allow upper-half Latin-1 characters!

iains commented 7 months ago

If I understand correctly, there's an upstream bug to track this (and the problem is not specific to this port) - if both those are tre please close the issue (the BZ PR is open).

simonjwright commented 7 months ago

BZ? but, closing.

iains commented 7 months ago

BZ? but, closing.

Bugzilla .. (PR now gets confused with "pull request" :roll_eyes: