iains / gcc-darwin-arm64

GCC master branch for Darwin with experimental support for Arm64. Currently GCC-15.0.0 [September 2024]
GNU General Public License v2.0
268 stars 33 forks source link

aarch64: Tune apple-m* cpus #131

Open fxcoudert opened 4 months ago

fxcoudert commented 4 months ago

Trying to update CPU features based on documentation.

The original comment stated “M2 mostly 8.5 but with missing mandatory features” but the documentation actually stated its base ARM ISA is v8.6, and the LLVM source code agrees: https://github.com/llvm/llvm-project/blame/69d740e5d64257524914aabd6dfead7565185d4f/llvm/include/llvm/TargetParser/AArch64TargetParser.h#L713

I've looked at the supported EL0 features, but picked only those that are actually used by GCC target code.

I haven't even tested this, I will bootstrap and regtest. It will definitely require careful scrutiny, especially for the changes to the "base" profile apple-m1.

iains commented 4 months ago

Trying to update CPU features based on documentation.

The original comment stated “M2 mostly 8.5 but with missing mandatory features” but the documentation actually stated its base ARM ISA is v8.6, and the LLVM source code agrees: https://github.com/llvm/llvm-project/blame/69d740e5d64257524914aabd6dfead7565185d4f/llvm/include/llvm/TargetParser/AArch64TargetParser.h#L713

I think the LLVM source is probably wrong (but we can check with the Arm folks tomorrow - today is a public holiday in the UK)

=====

I checked the features list on the ARM documentation [many are optional at some rev and then beome mandatory at some later rev] - and AFAICT there is one mandatory case for 8.5 missing form Mx

Arm® Architecture Reference Manual for A-profile architecture (latest):

"FEAT_SPECRES is mandatory from Armv8.5. FEAT_SPECRES is OPTIONAL from Armv8.0.”

macOS sysctl:

"hw.optional.arm.FEAT_SPECRES: 0”

I welcome a second opinion tho :)

iains commented 4 months ago

Trying to update CPU features based on documentation. The original comment stated “M2 mostly 8.5 but with missing mandatory features” but the documentation actually stated its base ARM ISA is v8.6, and the LLVM source code agrees: https://github.com/llvm/llvm-project/blame/69d740e5d64257524914aabd6dfead7565185d4f/llvm/include/llvm/TargetParser/AArch64TargetParser.h#L713

I think the LLVM source is probably wrong (but we can check with the Arm folks tomorrow - today is a public holiday in the UK)

Well, maybe not the LLVM source as such - but the assertion that M2 is based on Armv8.6 (which is then captured as a fact in the LLVM source), I suppose.

fxcoudert commented 4 months ago

FEAT_SPECRES is mandatory from Armv8.5.

That would mean Apple doc is incorrect. I'll report it. (But it may be moot because GCC does not seem to use it?)

Right now the build fails in libgcc:

_mulhc3_s.s:17:2: error: instruction requires: fullfp16
        fmul    h28, h1, h3
        ^
iains commented 4 months ago

Do we have some issue or example where our current options are not working properly. I do agree we could (probably) have a better scheduling model now that some of the information has been published - but that's new territory for me - so will take a while (or someone with that experience to step in).

Another concern is the the ordering of extensions / features in clang and GCC differ in some cases (and both toolchains have assembler bugs that contradict) .. the ordering issue is, I think being gradually sorted out - the folks I normally talk to in Arm are on vacation this week ..

fxcoudert commented 4 months ago

If you think it's premature or useless we can close this. I was just reading through the Apple doc and thinking it would be good to base our code on that, rather than previous guesses (although they are more than guesses now, being supported by experience).

Anyway, let's keep this open until I can get a response on the FEAT_SPECRES issue.

iains commented 4 months ago

FEAT_SPECRES is mandatory from Armv8.5.

That would mean Apple doc is incorrect. I'll report it. (But it may be moot because GCC does not seem to use it?)

Right now the build fails in libgcc:

_mulhc3_s.s:17:2: error: instruction requires: fullfp16
        fmul    h28, h1, h3
        ^

under what circumstances? (I have bootstrapped the current tree at least on darwin21 - unfortunately. the 15.3 assembler will not run on darwin21)

fxcoudert commented 4 months ago

The build failure is this PR on darwin23 with Xcode & CLT 15.3

iains commented 4 months ago

If you think it's premature or useless we can close this. I was just reading through the Apple doc and thinking it would be good to base our code on that, rather than previous guesses (although they are more than guesses now, being supported by experience).

The guessing was (and to some extent still is) about scheduling and tuning models - the Arm folks advised to use what's currently there.

That's somewhat distinct from the arch level and FEAT stuff - which ought to be covered by published specs - if you don't have it yet - then "DDI0487K_a_a-profile_architecture_reference_manual.pdf" is, I think, the latest and contains the criteria.

Anyway, let's keep this open until I can get a response on the FEAT_SPECRES issue.

Sure, it would nice to be able to simplify the descriptions to be just _v5a and _v6a (since the only new addition in m3 does not alter the extension set), but I don't think we can do that without clarifying.

iains commented 4 months ago

The build failure is this PR on darwin23 with Xcode & CLT 15.3

does the devt tree bootstrap OK (either trunk or gcc-14-1-darwin-pre-0 should be equivalent).

I need some more hardware so that I can run bleeding edge separately from my office and production machines. (I am using 15.1b for aarch64, since that's the newest that will run on my laptop)

fxcoudert commented 4 months ago

does the devt tree bootstrap OK

Yes it does, no problem there.

iains commented 4 months ago

does the devt tree bootstrap OK

Yes it does, no problem there.

in which case, I suspect you are duplicating extensions that are already in the base arch you have specified - and that is messing up the priority/ordering of them leading to the apparent missing extension - try taking out all the extensions that are part of the base you are specifying (it should work in a consistent manner, even if we do not know that the claimed arch is right yet)

fxcoudert commented 4 months ago

So I've pushed again, I don't think there are duplicates now, still it doesn't build. I've tried fiddling with the order but it didn't help.

iains commented 4 months ago

there seems to be a property of the LLVM-based assembler (maybe a feature, more likely a bug) that does this;

when we specify some arch like ".arch armv8.4-a" and then append something that was introduced by an earlier arch (e.g. .arch armv84-a+crc) it then resets the base arch to the one that introduced +crc (arm 8.0-a)

I am not sure if that is the underlying issue - but the ordering of arm extensions cannot be allowed to be arbitrary anyway because later ones can remove or add something to the starting value (some extensions imply others, for example)

iains commented 4 months ago

So I've pushed again, I don't think there are duplicates now, still it doesn't build. I've tried fiddling with the order but it didn't help.

is that with an explicit "--with-cpu=apple-mx" on the configure line (it will default to apple-m1, if not)

I'll try to take a better look later ...

fxcoudert commented 4 months ago

is that with an explicit "--with-cpu=apple-mx" on the configure line (it will default to apple-m1, if not)

No, I let it default to apple-m1. The problem is with that line, somehow.

iains commented 4 months ago

is that with an explicit "--with-cpu=apple-mx" on the configure line (it will default to apple-m1, if not)

No, I let it default to apple-m1. The problem is with that line, somehow.

So, I think that the problem might be that 8.4-a already includes AES and SHA3 and the effect I noted above is kicking in - the addition of those is reseting the base to something without fp16/fp16fml.

Unfortunately to find some of this hidden info - means poking at the Arm ARM and the sources for the extensions management

iains commented 4 months ago

I'd be happy to agree that it seems dumb that putting an extension that is already present in the specified base changes the specified base .. but perhaps there's some Good Reason ...