jimeh / emacs-builds

Self-contained Emacs.app builds for macOS, with native-compilation support.
https://github.com/jimeh/emacs-builds/releases
357 stars 8 forks source link

Looks like Intel builds missing from Emacs-30.0.92-pretest #37

Open shipmints opened 3 weeks ago

shipmints commented 3 weeks ago

I see only arm64 builds here https://github.com/jimeh/emacs-builds/releases/tag/Emacs-30.0.92-pretest

Perhaps something broke?

jimeh commented 3 weeks ago

@shipmints indeed, apologies for this happening again. I fired off a build around 7am this morning, arm build completed in less than an hour, but the intel build failed after 4-5 hours. Around 3pm the homebrew repo would have updated itself, and found the new protest release, which only has Intel builds. I restarted the Intel build a while ago, but it's still running: https://github.com/jimeh/emacs-builds/actions/runs/11625754054/job/32394212318

Though I don't have high hopes right now, as the nightly builds have been failing for a few days the same way while installing build deps via homebrew. I should have some time this weekend to try and solve it.

It does look like I'll either need to get build deps via nix instead of homebrew working, or drop macOS 12.x support, as homebrew no longer supports it. That's why Intel builds have been taking +4 hours the past month or so, and most likely the reason for these failures.

I will give the nix approach a try first, as homebrew has been a constant source of headaches for years with this, and also cause I much prefer keeping supported macOS versions as wide as possible.

jimeh commented 3 weeks ago

For now though, if this second build attempt fails, I'll revert the latest pretest change in the homebrew tap, and trash the release for now, so the emacs-app-pretest cask actually works again, even if it's an outdated pretest build for a while longer.

shipmints commented 3 weeks ago

Thanks for looking after this. FWIW, I still use macOS 12.x on all my Macs because it's the most stable OS I have available (and at least one of my macs can't be upgraded--thanks, Apple). If I get an M4 box in the next few months, I'll have to suck it up and deal with a new crop of bugs. So not looking forward to that.

jimeh commented 3 weeks ago

I've reverted the emacs-app-pretest cask for now. Tonight's nightly build is affected by the same issue as well, so I've done the same revert for the emacs-app-nightly and emacs-app-monthly casks too.

jimeh commented 3 weeks ago

@shipmints So seem to have Nix-based builds working, have a few test builds running atm, by looking positive. I'll bug you with a test build later for you to try if you don't mind :)

However, it looks like I will need to drop macOS 12.x support in less than a month regardless :(

GitHub Actions had deprecated the macos-12 runner image on October 17th, and they will be completely removing it on December 3rd. More details here: https://github.com/actions/runner-images/issues/10721

jimeh commented 3 weeks ago

@shipmints would you mind testing if the x86_64-dmg build artifact of this build works on for you?

https://github.com/jimeh/emacs-builds/actions/runs/11647762203

shipmints commented 3 weeks ago

I'll test that build this morning.

The github support message is unclear to me. It says "Deprecation will begin on 10/7/24 and the image will be fully unsupported by 12/3/24" When they write "unsupported" to they mean deleted forever? I'd think they'd leave it to aid back-version testing, security analysis tools, etc.

shipmints commented 3 weeks ago

I get a 404 on the link https://github.com/jimeh/emacs-builds/releases/tag/untagged-0ef7854b97a468a4976f

jimeh commented 3 weeks ago

Weird, that build log link shouldn't take you to any draft releases. Try this link directly to the build log artifact instead please: https://github.com/jimeh/emacs-builds/actions/runs/11647762203/artifacts/2137257947

As for older macOS versions, I do have some good news. Nix actually comes with quite old macOS SDKs by default, meaning that it looks like Intel builds might work on macOS 10.12.x and later, and ARM builds on macOS 11.x and later. I'm about to test a ARM build created on macOS 15 in a macOS 12 VM to see if it works.

Then there's some last bits around the build script and build workflows here to iron out, but hopefully I can get Nix-backed builds live in the next couple of days :)

shipmints commented 3 weeks ago

Thank you for following up. I'm getting a complaint that the developer can't be verified. Perhaps signing is broken in the revised workflow?

jimeh commented 3 weeks ago

Hmm, that's weird, can you try running spctl -vvv --assess --type exec <path/to/Emacs.app> and see what you get?

I get the below both on macOS 15 and in a my fresh macOS 12 VM.

spctl -vvv --assess --type exec /Volumes/Emacs.2024-11-02.d245fb3.master.macOS-12.x86_64.test.use-nix-3/Emacs.app
/Volumes/Emacs.2024-11-02.d245fb3.master.macOS-12.x86_64.test.use-nix-3/Emacs.app: accepted
source=Notarized Developer ID
origin=Developer ID Application: Jim Myhrberg (5HX66GF82Z)

In happier news, I have managed use a macOS 12 VM on ARM, to test the above mentioned x86_64 build via Rosetta, and also a native ARM build. Both seem to be working fine, so regardless of GitHub deprecating the macos-12 runner image, Nix should let us support older versions of macOS for quite a while longer :)

shipmints commented 3 weeks ago

spctl reports accepted for the "app" but I can't run Content/MacOS/Emacs directly. The other builds work fine, of course. Curious.

$ spctl -vvv --assess --type exec /Volumes/Emacs.2024-11-02.d245fb3.master.macOS-12.x86_64.test.use-nix-3/Emacs.app
/Volumes/Emacs.2024-11-02.d245fb3.master.macOS-12.x86_64.test.use-nix-3/Emacs.app: accepted
source=Notarized Developer ID
origin=Developer ID Application: Jim Myhrberg (5HX66GF82Z)
image
jimeh commented 3 weeks ago

That is weird, I've tested launching Contents/MacOS/Emacs from terminal too, and it seems happy to verify things just fine on my macOS 15 and macOS 12 VM. They are ARM-based though.

In the meantime, a new build is ready that's not directly meant to address signing issues, but it does change up the Nix env setup a bit, so might make some difference.

Build log is here: https://github.com/jimeh/emacs-builds/actions/runs/11653031986

Direct download link to the x86_64-dmg artifact: https://github.com/jimeh/emacs-builds/actions/runs/11653031986/artifacts/2138245219

You should get a zip file containing Emacs.2024-11-03.b3c82f9.master.macOS-10-12.x86_64.test.use-nix-6.dmg.

shipmints commented 3 weeks ago

I just downloaded -6 and macos 12.7.6 still complains about veracity despite spctl reporting all clear. Perhaps I need a reboot (only half joking--Apple does have bugs,gasp). I'll try again later today after I can suffer a reboot.

jimeh commented 3 weeks ago

Thanks. I'll see if I can give it a test later today on my wife's Intel-based mac. I don't remember what macOS version it's running, but I know it's old... lol

jimeh commented 3 weeks ago

I've reproduced the signing verification failure on my wife's Intel-based mac running macOS 11. I have one suspected culprit related to how some libraries don't set a sdk value within the LC_VERSION_MIN_MACOSX Mach-O load command, as codesign -vv -d <path/to/file> complains with a warning.

I have a hacky and experimental fix in this commit of my build script that tries to resolve it.

New build with that fix is running here: https://github.com/jimeh/emacs-builds/actions/runs/11656998140

It should be about an hour before a x86_64-dmg artifact shows up. I'm about to sleep, so I'll test it myself tomorrow :)

shipmints commented 3 weeks ago

Looks like that experimental build didn't make it. Available to test much of the day today.

jimeh commented 3 weeks ago

Thanks for testing again. It turns out the that my hacky fix ran too early so it actually missed fixing up some files. I've refactored the hacky and fix and started a new build here: https://github.com/jimeh/emacs-builds/actions/runs/11673367277

Annoyingly, I actually don't know if this will fix the signing issue on Intel machines or not, but it should address the only warnings of any kind I've managed to find when poking the application and individual binaries and libraries within the app with codesign and spctl.

If it doesn't fix it however, I'm somewhat out of immediate ideas, and gonna need to do some deep digging, as there's some obscure blog posts and gists around by people documenting signing and verification oddities with applications that bundle in shared libraries.

Update: That build failed, this one should succeed, hopefully... 😅 https://github.com/jimeh/emacs-builds/actions/runs/11674619420

shipmints commented 3 weeks ago

Standing by when ready. No huge rush but would be nice to help the Emacs core team with testing these pre-release candidates.

jimeh commented 3 weeks ago

I managed to get a new build working in the end after a few bugs. Writing code while half-asleep is always a good idea 😁

https://github.com/jimeh/emacs-builds/actions/runs/11656998140

However, it seems it still fails verification on Intel Macs :(

I'll hopefully have more time tomorrow evening to dig into it a bit more to try and figure out why it's failing. As of now, no tools Apple provide seem to be complaining, but yet launching the app it throws a signing verification failure. I'll see what I find.

Alternatively I plan to see if I can get older macOS SDK versions working in GitHub actions, with the hope I might be able to target macOS 12 from 13, while pulling in dependencies from homebrew. It's probably a long shot though, as I expect libraries in homebrew to not support older macOS versions.

shipmints commented 3 weeks ago

Thanks again for plugging away at this. Have you taken a look at any o/s log files that might reveal what is otherwise a near-silent failure?

jimeh commented 2 weeks ago

@shipmints please try the x86_64-dmg artifact from this new build: https://github.com/jimeh/emacs-builds/actions/runs/11752219574

You should end up with Emacs.2024-11-08.766ec1f.master.macOS-11.x86_64.test.use-nix-15.dmg.

It verifies correctly on my wife's Intel mac now with macOS 11:

Screenshot 2024-11-09 at 03 03 24

It seems the "developer cannot be verified" error was kind of wrong. Essentially on Intel-based macs, macOS was failing to load a shared library, which ended up disguised as the app signing looking error.

When embedding the shared libraries into Emacs, the build script specifically sets an LC_RPATH in Emacs' main executable that points to Contents/Frameworks within Emacs.app bundle, where we copy all shared libraries too, along with relinking all shared libraries to be @rpath/<name-of-library>.

One difference with Nix compared to Homebrew, is that some of the shared libraries that Emacs needs, have their own LC_RPATH set. Which seems to override the rpath set in the main Emacs executable on Intel-based macs, leading them to not be able to find the shared library. Oddly enough, this is not an issue for macOS on Apple Silicon, and it seems to happily search all rpaths it's aware of for a shared library.

Hence, the fix is to appropriately tidy up any LC_RPATH values set in any of the embedded shared libraries.

I'll get a use-nix-16 build started shortly that removes my earlier hacky SDK version fix attempt, as I don't think it's of any use, and it's frankly kind of horrible, so if I can rip it out safely, I will :)

shipmints commented 2 weeks ago

Good detective work. You'd think the toolchain linker would be the same. Perhaps M platform has a newer OS with a different xcode/clang linker that now behaves differently.

While the build now no longer complains about publisher verification, it segfaults almost every other launch and complains about some missing symbols in a package that worked well under the 30.0.91 build and also in use every day on your 29.4 build so something else is up.

jimeh commented 2 weeks ago

Interesting. Are you able to share what package it is, and a log of the errors? Hopefully I can reproduce it :)

Though it's unlikely, I'm actually hoping it's an issue with Emacs itself, as the build is from the latest master branch. To test that theory, I just started a build of 30.0.91 here: https://github.com/jimeh/emacs-builds/actions/runs/11755661889

shipmints commented 2 weeks ago

I just redownloaded the homebrew-based 30.0.91 build and I get the same missing symbol issue but no crashes yet. This occurs on first-use of consult-buffer (which uses marginalia) after bootstrap but disappears if I restart Emacs after having let the native compiler cache fill. If I empty the eln-cache and start again, it's repeatable. It seems likely related to ELPA compat's macros compat--maybe-require and compat-function but I haven't dug into it.

Debugger entered--Lisp error: (void-function marginalia--orig-completion-metadata-get)
  marginalia--orig-completion-metadata-get((metadata (category . multi-category) (group-function . #f(compiled-function (&rest args2) #<bytecode 0x1e1218aa61b27464>)) (affixation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60c1741d>)) (annotation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60ce741d>)) (cycle-sort-function . identity) (display-sort-function . identity)) category)
  marginalia-classify-original-category()
  run-hook-with-args-until-success(marginalia-classify-original-category)
  marginalia--completion-metadata-get((metadata (category . multi-category) (group-function . #f(compiled-function (&rest args2) #<bytecode 0x1e1218aa61b27464>)) (affixation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60c1741d>)) (annotation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60ce741d>)) (cycle-sort-function . identity) (display-sort-function . identity)) category)
  apply(marginalia--completion-metadata-get ((metadata (category . multi-category) (group-function . #f(compiled-function (&rest args2) #<bytecode 0x1e1218aa61b27464>)) (affixation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60c1741d>)) (annotation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60ce741d>)) (cycle-sort-function . identity) (display-sort-function . identity)) category))
  completion-metadata-get((metadata (category . multi-category) (group-function . #f(compiled-function (&rest args2) #<bytecode 0x1e1218aa61b27464>)) (affixation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60c1741d>)) (annotation-function . #f(compiled-function (&rest args2) #<bytecode 0x1e29ad9a60ce741d>)) (cycle-sort-function . identity) (display-sort-function . identity)) category)
  vertico-multiform--setup()
  minibuffer-setup()
shipmints commented 2 weeks ago

Now taking https://github.com/jimeh/emacs-builds/actions/runs/11755661889 out for a run. Same marginalia compat bootstrap issue but that doesn't seem build related.

shipmints commented 2 weeks ago

Both builds are crashing so it's hard to blame nix vs. homebrew. Been intermittently getting lisp stack traces when it doesn't segv. e.g., one from the homebrew build I kept:

Debugger entered--Lisp error: (invalid-read-syntax "#1#")
  #<subr require>(org-macro nil nil)
  ad-Advice-require(#<subr require> org-macro)
  apply(ad-Advice-require #<subr require> org-macro)
  require(org-macro)
  byte-code("\301\302!\210\301\303!\210\10\304W\203\22\0\301\305!\210\301\306!\210\301\307!\210\301\310!\210\301\311!\210\301\312!\210\301\313!\210\301\314!\210\301\315!\207" [emacs-major-version require outline time-date 28 easymenu org-entities org-faces org-list org-pcomplete org-src org-footnote org-macro ob] 2)
  #<subr require>(org nil nil)
  ad-Advice-require(#<subr require> org nil nil)
  apply(ad-Advice-require #<subr require> (org nil nil))
  require(org nil nil)

Here's a native stack trace (not running under a debugger this is just macos crash reporter):

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       KERN_INVALID_ADDRESS at 0xfffffffffffffff8
Exception Codes:       0x0000000000000001, 0xfffffffffffffff8
Exception Note:        EXC_CORPSE_NOTIFY

VM Region Info: 0xfffffffffffffff8 is not in any region.  Bytes after previous region: 18446603336221409273
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      VM_ALLOCATE              7ffffffcb000-7ffffffcc000 [    4K] r-x/r-x SM=ALI
--->
      UNUSED SPACE AT END

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib              0x7ff806254fce __pthread_kill + 10
1   libsystem_pthread.dylib             0x7ff80628b1ff pthread_kill + 263
2   libsystem_c.dylib                   0x7ff8061992c8 raise + 26
3   Emacs                                  0x10c7b0f09 terminate_due_to_signal + 169
4   Emacs                                  0x10c7b188b emacs_abort + 15
5   Emacs                                  0x10c76c932 ns_term_shutdown + 162
6   Emacs                                  0x10c612763 shut_down_emacs + 371
7   Emacs                                  0x10c7b0ed7 terminate_due_to_signal + 119
8   Emacs                                  0x10c63a5be handle_fatal_signal + 14
9   Emacs                                  0x10c63a641 deliver_thread_signal + 129
10  Emacs                                  0x10c638909 deliver_fatal_thread_signal + 9
11  Emacs                                  0x10c63a689 handle_sigsegv + 57
12  libsystem_platform.dylib            0x7ff8062a0dfd _sigtramp + 29
13  ???                                            0x0 ???
14  Emacs                                  0x10c6e0b82 read_internal_start + 322
15  Emacs                                  0x10c70a0ba load_comp_unit + 570
16  Emacs                                  0x10c70ab4a Fnative_elisp_load + 362
17  Emacs                                  0x10c6de186 Fload + 3014
18  Emacs                                  0x10c6e02fd save_match_data_load + 77
19  Emacs                                  0x10c6b1a27 load_with_autoload_queue + 311
20  Emacs                                  0x10c6c14b2 Frequire + 594
21  Emacs                                  0x10c7011be exec_byte_code + 3102
22  Emacs                                  0x10c6b34ff funcall_lambda + 895
23  Emacs                                  0x10c6ad50a Ffuncall + 458
24  Emacs                                  0x10c7011be exec_byte_code + 3102
25  Emacs                                  0x10c70056e Fbyte_code + 126
26  Emacs                                  0x10c6acb54 eval_sub + 2532
27  Emacs                                  0x10c6b1b3d Feval + 77
28  org-src-0d4c3baf-a6ba8277.eln          0x120917bd2 top_level_run + 898
29  Emacs                                  0x10c70a36d load_comp_unit + 1261

I'll go back now to the nix build and see if I can repro a lisp or native stack trace to see if it's in the same arena.

shipmints commented 2 weeks ago

Nix build same lisp stack trace but it took 5 restarts to get it. Not really sure what's non-deterministic about running Emacs from scratch each time. I'm merely starting it and closing it until it crashes (or not, is the hope). They all seem org related but I don't see this in 29.4 with the identical packages (my org is the ELPA org not the Emacs org). Still not clear this is a build issue.

Debugger entered--Lisp error: (invalid-read-syntax "#1#")
  #<subr require>(org-macro nil nil)
  ad-Advice-require(#<subr require> org-macro)
  apply(ad-Advice-require #<subr require> org-macro)
  require(org-macro)
  byte-code("\301\302!\210\301\303!\210\10\304W\203\22\0\301\305!\210\301\306!\210\301\307!\210\301\310!\210\301\311!\210\301\312!\210\301\313!\210\301\314!\210\301\315!\207" [emacs-major-version require outline time-date 28 easymenu org-entities org-faces org-list org-pcomplete org-src org-footnote org-macro ob] 2)
  #<subr require>(org nil nil)
  ad-Advice-require(#<subr require> org nil nil)
  apply(ad-Advice-require #<subr require> (org nil nil))
  require(org nil nil)
shipmints commented 2 weeks ago

Here's a new one with the nix build trying the terminal vs. the GUI. This works fine with homebrew's 29.4 (non-native), your 29.4 (homebrew, native), your Emacs.2024-09-11.9a1c76b.emacs-30-0-91-pretest.macOS-12.x86_64 (homebrew, native).

/Volumes/Emacs.2024-09-11.9a1c76b.emacs-30-0-91-pretest.macOS-11.x86_64.test.use-nix-17/Emacs.app/Contents/MacOS/Emacs -nw
emacs: Cannot open terminfo database file
jimeh commented 2 weeks ago

I've confiscated my wife's Intel-based mac for a bit and setup my own user on it so I can do some proper testing with the Emacs.2024-09-11.9a1c76b.emacs-30-0-91-pretest.macOS-11.x86_64.test.use-nix-17.dmg build from here.

I can reproduce the emacs: Cannot open terminfo database file issue when using Apple's Terminal.app, but not when using iTerm2. Also setting TERMINFO=/usr/share/terminfo env var in Terminal.app fixes the issue.

Annoyingly I can't test it with homebrew-based 30.0.91 build from October 1st, as it requires macOS 12 or later to run, and my wife's machine is still on macOS 11 at the moment, and I'd prefer to leave her machine upgrade can of worms for another time... lol

The other issues, both with elisp errors and crashes, are more curious however. For my testing, I got my own full emacs config up and running without issues. And it uses consult, marginalia, and lots of other things.

I also swapped out the built-in org-mode for one installed from the upstream source repo with straight.el, and in a blank emacs config I installed org 9.7.16 from ELPA. In both cases native compilation worked fine, and upon multiple restarts and re-opening of org files in both setups, I never ran into any issues. I also trashed the compiled *.eln files for org a few times and retried things again, still no issue. So I can't really reproduce your other issues.

Looking closer at your crash log, it seems like the crash happened due to the native code in org-src-0d4c3baf-a6ba8277.eln was trying to read an invalid memory address (0xfffffffffffffff8). The elisp errors seems to be either missing functions, or invalid byte-compiled elisp in *.elc files. The more I think it, combined with the fact it's happening intermittently for you, I'm starting to suspect that either your RAM is failing and is randomly corrupting memory, or that your SSD/HDD is failing and randomly returning corrupt data for files on disk :(

shipmints commented 2 weeks ago

If my machine was failing, though, I'd see these issues in other apps and your Emacs 29.4 build but I don't. Though the possibility of failing hardware does make me a bit anxious, the lack of issues in other heavily-used CPU and RAM-intensive apps suggest it's fine.

Why would TERMINFO need to be explicitly set when not needed for other builds is a good question. I'd prefer not to just put in a workaround vs. understanding the source of the issue. For people who adopt your builds, this may be a source of Github issue irritation.

jimeh commented 2 weeks ago

Agreed regarding TERMINFO, I have fired off a new build (use-nix-19) that's a blind shot in the dark of adding gettext to the build environment to see if it helps. If it doesn't, I'll need to dig deeper to try and understand what's missing from the Nix build, and also try and get a Intel-based macOS 12 VM up and running.

And good point about my hardware issue theory. If you haven't had weird behavior anywhere else, it makes it extremely unlikely. It was mostly the inconsistent nature of the issues happening on and off that made me suspect RAM issues. But it's tricky for me to try and figure out what might be the root cause of the issues, since I haven't been able to reproduce them yet. Have you tried reproducing the org and marginalia issues in isolation in blank/fresh Emacs configs?

Only other idea I have, is if your ELN files are identical on repeated rebuilds. So for example, next crash caused by an ELN file like the one above, move the ELN file out of the way and restart emacs to have it recreated, and check if checksums match.

jimeh commented 2 weeks ago

The TERMINFO issue does affect ARM builds too, I just hadn't noticed cause I have a custom ~/.terminfo directory, but it does fail in the vanilla ARM macOS 12 VM I have. So that's a lot easier to try and figure out now :)

shipmints commented 2 weeks ago

Nice. But aren't you watching Arsenal v Chelsea, right now? :)

jimeh commented 2 weeks ago

I figured out the terminfo issue. The homebrew build did not bundle in libncurses, instead it linked to from macOS itself, which causes it to check macOS's own terminfo paths. So I have a new build that removes ncurses from the Nix package list, forcing the build to use the system version of ncurses. On the ARM side, builds created on macOS 15, work on macOS 12, so fingers crossed for Intel-based builds too.

New build is progressing here: https://github.com/jimeh/emacs-builds/actions/runs/11767764472

P.S. I can't say I've watched any sporting event since, uhm, I actually don't remember >_<... And tonight Hocus Pocus 2 is playing on our TV, for reasons beyond my control... lol

shipmints commented 2 weeks ago

Small sacrifice for family unity. But no footie, I can't understand :)

I assume you took a look to see if there were other nix deps that might be better left to ambient ones?

I'll take a look at the build tomorrow afternoon and put it through some paces. Perhaps macos 15 builds help the intermittent issues to magically disappear in what is kind of a deterministic test (my init file has no stochastic behavior).

jimeh commented 2 weeks ago

Don't stress/rush to test builds here, I'm more than happy and thankful for any time and help you're able and willing to offer :)

The homebrew builds pull in sqlite3 and libz from macOS as well, while the nix builds bundles in their own copies from their respective nix packages. Compared to ncurses, I can't think of any obvious reasons why those two might cause issues due to not integrating with macOS like ncurses does though, so I've left them bundled into the app for now.

The build I linked above (use-nix-21 / dmg) is working fine for me on x86 macOS 11.3. Native comp, org-mode, terminfo / TUI mode in Terminal.app, etc. are working without issue. The build also changes minimum macOS version from 11 to 10.12, as I switched to the Nix's default macOS SDK version (10.12 on x86, 11 on ARM). I don't have any macOS 10.x machines to test it on, but it does seem fine on macOS 11.

use-nix-21 was built on a macOS 13 GitHub Actions runner though, which is my plan going forward with the builds, essentially using the oldest supported macOS version on GitHub Actions to try and improve compatibility with older macOS versions. The ARM builds I mentioned before was locally on my laptop between the host OS and the macOS 12 VM I have.

jimeh commented 2 weeks ago

One newer build (use-nix-22 / dmg), which is made after some tiding up of things in the build script. Should in theory be the same as use-nix-21.

shipmints commented 2 weeks ago

Thanks for the new build. I ran full diags on the machine that was involved in spurious Emacs crashes and the machine passed just fine except for a couple of corrupted time machine apfs snapshots which I deleted (unrelated to Emacs).

The build from yesterday indeed solves the terminfo issue. I deleted the eln cache and started from scratch and still getting the likely compat issue in marginalia during bootstrap which disappears after an Emacs restart. This may be solved with the 30.0.92 but we'll have to see. I haven't seen spurious crashes (yet).

Debugger entered--Lisp error: (void-function marginalia--orig-completion-metadata-get)
  marginalia--orig-completion-metadata-get((metadata (category . bookmark)) category)

Now that the nix build seems workable, is it possible to make a 30.0.92 build for the second pretest and maybe one from master?

jimeh commented 2 weeks ago

Woo, that's great news.

I've started two new builds for:

They should be done in about an hour if nothing goes wrong 😁

Regarding marginalia, I did realize your version and my version of marginalia might not be the same, so you possibly have a version which has a bug related to native comp of some kind.

shipmints commented 2 weeks ago

Tried them both. Thanks for that. The likely issue with the "compat" and marginalia still there. Not build related, near 100%. My ELPA compat is compat-30.0.0.0 and my ELPA marginalia is marginalia-20240926.918 if you want to try this.

The one thing I can say is that the performance of these vs. your 29.4 build is poorer. Seems screen redraw related but I didn't do any deep technical analysis on this. I loaded a bunch of tab-bar tabs full of stuff and used the key binding to next and prev tab to switch among them back and forth. 29.4 is nice and snappy. These two builds are sluggish. This is even after Emacs restarts with prepopulated eln cache so it should all be high speed. Is it possible that the native eln cache in the build is not built with the optimizer enabled but was on 29.4?

shipmints commented 2 weeks ago

Didn't dig much beyond this but the differences are weird.

29_4-65ab8681(0)$ ls -l subr*
-rw-r--r--  1 shipmints  admin  89792 Jul  2 07:28 subr-x-02dfef32-efcc4a00.eln
# trampoline not here (gets generated in the user's eln cache)
30_0_92-28488a40(0)$ ls -l subr*
-rw-r--r--  1 shipmints  staff  28496 Nov 12 14:09 subr--trampoline-6d6163726f657870616e64_macroexpand_0.eln
# compiled binary size seems materially different
-rw-r--r--  1 shipmints  staff  60832 Nov 12 14:09 subr-x-02dfef32-3fba5ed2.eln
jimeh commented 2 weeks ago

Hmm, performance differences are interesting.

I can think of two differences that might be related:

I've fired off a couple of Nix-based Emacs 29.4 builds if you're up for some performance comparisons, to see if it's the macOS target version that might be the culprit:

On the ARM side of things I haven't noticed any change in performance, but that's kind of expected with how fast Apple Silicon is. I'll try and do some performance testing with my wife's Intel Mac in the next day to two :)

jimeh commented 2 weeks ago

The native eln files cached/bundled into Emacs.app should have been produced with the same settings for all of these builds, or at least, whatever is the default for the commit of emacs that was built. The build script only enables native comp and tells it to native-compile all .el files that are part of emacs. It doesn't set any other kind of optimization options or anything.

If you want to try and eliminate the bundled .eln files, you could simply remove them from Contents/Frameworks/native-lisp within the app, as they should all be compiled again into the user eln-cache instead.

shipmints commented 2 weeks ago

Differences in otool -L run against the Emacs binary are interesting. The 29.4 build has several Mac frameworks at the end of the list that the nix builds seem to be missing. Could be innocuous. They include CoreGraphics that might be a hint but I'm no deep Mac app expert.

29.4
$ otool -L /Applications/Emacs.app/Contents/MacOS/Emacs 
/Applications/Emacs.app/Contents/MacOS/Emacs:
    /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit (compatibility version 45.0.0, current version 2299.30.112)
    /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0)
    /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon (compatibility version 2.0.0, current version 169.0.0)
    /System/Library/Frameworks/IOSurface.framework/Versions/A/IOSurface (compatibility version 1.0.0, current version 1.0.0)
    /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore (compatibility version 1.2.0, current version 1.11.0)
    /System/Library/Frameworks/UniformTypeIdentifiers.framework/Versions/A/UniformTypeIdentifiers (compatibility version 1.0.0, current version 709.0.0)
    @rpath/libtiff.6.dylib (compatibility version 7.0.0, current version 7.2.0)
    @rpath/libjpeg.8.3.2.dylib (compatibility version 8.0.0, current version 8.3.2)
    @rpath/libpng16.16.dylib (compatibility version 60.0.0, current version 60.0.0)
    @rpath/libgif.7.2.0.dylib (compatibility version 0.0.0, current version 7.2.0)
    @rpath/libwebpdemux.2.0.15.dylib (compatibility version 3.0.0, current version 3.15.0)
    @rpath/libwebpdecoder.3.1.9.dylib (compatibility version 5.0.0, current version 5.9.0)
    @rpath/librsvg-2.2.dylib (compatibility version 53.0.0, current version 53.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)
    @rpath/libgio-2.0.0.dylib (compatibility version 8001.0.0, current version 8001.3.0)
    @rpath/libgdk_pixbuf-2.0.0.dylib (compatibility version 4201.0.0, current version 4201.12.0)
    @rpath/libgobject-2.0.0.dylib (compatibility version 8001.0.0, current version 8001.3.0)
    @rpath/libglib-2.0.0.dylib (compatibility version 8001.0.0, current version 8001.3.0)
    @rpath/libintl.8.dylib (compatibility version 13.0.0, current version 13.0.0)
    @rpath/libcairo.2.dylib (compatibility version 2.0.0, current version 2.0.0)
    /System/Library/Frameworks/WebKit.framework/Versions/A/WebKit (compatibility version 1.0.0, current version 614.3.7)
    @rpath/libdbus-1.3.dylib (compatibility version 36.0.0, current version 36.4.0)
    @rpath/libxml2.2.dylib (compatibility version 15.0.0, current version 15.8.0)
    /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
    @rpath/libgnutls.30.dylib (compatibility version 69.0.0, current version 69.0.0)
    @rpath/liblcms2.2.dylib (compatibility version 3.0.0, current version 3.16.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    @rpath/libjansson.4.dylib (compatibility version 19.0.0, current version 19.0.0)
    @rpath/libgmp.10.dylib (compatibility version 16.0.0, current version 16.0.0)
    @rpath/libgccjit.0.dylib (compatibility version 0.0.0, current version 26.0.26)
    @rpath/libtree-sitter.0.22.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libsqlite3.dylib (compatibility version 9.0.0, current version 346.0.0)
    /System/Library/Frameworks/CFNetwork.framework/Versions/A/CFNetwork (compatibility version 1.0.0, current version 1402.0.8)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1953.255.0)
    /System/Library/Frameworks/CoreGraphics.framework/Versions/A/CoreGraphics (compatibility version 64.0.0, current version 1690.3.3)
    /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 1228.0.0)
    /System/Library/Frameworks/CoreText.framework/Versions/A/CoreText (compatibility version 1.0.0, current version 1.0.0)
    /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1953.255.0)
    /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
30.0.92
$ otool -L /Volumes/Emacs.2024-10-26.ed1d691.emacs-30-0-92-pretest.macOS-10-12.x86_64.test.use-nix-24/Emacs.app/Contents/MacOS/Emacs 
/Volumes/Emacs.2024-10-26.ed1d691.emacs-30-0-92-pretest.macOS-10-12.x86_64.test.use-nix-24/Emacs.app/Contents/MacOS/Emacs:
    /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit (compatibility version 45.0.0, current version 1504.75.0)
    /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0)
    /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon (compatibility version 2.0.0, current version 157.0.0)
    /System/Library/Frameworks/IOSurface.framework/Versions/A/IOSurface (compatibility version 1.0.0, current version 1.0.0)
    /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore (compatibility version 1.2.0, current version 1.11.0)
    @rpath/libtiff.6.dylib (compatibility version 8.0.0, current version 8.0.0)
    @rpath/libjpeg.62.4.0.dylib (compatibility version 62.0.0, current version 62.4.0)
    @rpath/libpng16.16.dylib (compatibility version 60.0.0, current version 60.0.0)
    @rpath/libgif.7.2.0.dylib (compatibility version 0.0.0, current version 7.2.0)
    @rpath/libwebpdemux.2.0.15.dylib (compatibility version 3.0.0, current version 3.15.0)
    @rpath/libwebpdecoder.3.1.9.dylib (compatibility version 5.0.0, current version 5.9.0)
    @rpath/librsvg-2.2.dylib (compatibility version 53.0.0, current version 53.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.0.0)
    @rpath/libgio-2.0.0.dylib (compatibility version 8201.0.0, current version 8201.1.0)
    @rpath/libgdk_pixbuf-2.0.0.dylib (compatibility version 4201.0.0, current version 4201.12.0)
    @rpath/libgobject-2.0.0.dylib (compatibility version 8201.0.0, current version 8201.1.0)
    @rpath/libglib-2.0.0.dylib (compatibility version 8201.0.0, current version 8201.1.0)
    @rpath/libcairo.2.dylib (compatibility version 2.0.0, current version 2.0.0)
    /System/Library/Frameworks/WebKit.framework/Versions/A/WebKit (compatibility version 1.0.0, current version 602.3.12)
    @rpath/libdbus-1.3.dylib (compatibility version 36.0.0, current version 36.4.0)
    @rpath/libxml2.2.dylib (compatibility version 16.0.0, current version 16.4.0)
    /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
    @rpath/libgnutls.30.dylib (compatibility version 71.0.0, current version 71.0.0)
    @rpath/liblcms2.2.dylib (compatibility version 3.0.0, current version 3.16.0)
    @rpath/libz.1.3.1.dylib (compatibility version 1.0.0, current version 1.3.1)
    @rpath/libgmp.10.dylib (compatibility version 16.0.0, current version 16.0.0)
    @rpath/libgccjit.0.dylib (compatibility version 0.0.0, current version 24.0.24)
    @rpath/libtree-sitter.0.24.dylib (compatibility version 0.0.0, current version 0.0.0)
    @rpath/libsqlite3.0.dylib (compatibility version 9.0.0, current version 9.6.0)
shipmints commented 2 weeks ago

The macos 12 29.4 build from last night seems to work almost identically to your 29.4 homebrew build that I use daily. You can see in the screenshots that there are issues with coloring the inner border on the macos 10 29.4 build from last night (the second screenshot--the left-most pixel in the upper screenshot is yellow as it should be for me) suggesting that either a patch is missing in the macos 10 build or there's something else going on. It also seems to perform less well than 12/29.4 and certainly better than the 30.0.92 and master I tried yesterday.

(FWIW, both 29.4 builds do not suffer from the marginalia/compat native bootstrap issue.)

Emacs.2024-06-22.6a299b3.emacs-29-4.macOS-12.x86_64.test.use-nix-26

image

Emacs.2024-06-22.6a299b3.emacs-29-4.macOS-10-12.x86_64.test.use-nix-25

image
jimeh commented 2 weeks ago

I haven't had a chance to test your exact versions of compat and marginalia yet, will try to later tonight.

Interesting result from macOS 10.12 vs 12 targets, but nice to see that the 12 target seems the same to you as the old homebrew-based build, which also targets macOS 12.

I have just noticed that there's an old patch for macOS 10.14 and earlier to disable alligned malloc, which is not being applied to the 10.12 builds right now. The logic is checking the build OS version (which is macOS 13), rather than the target SDK version of 10.12. I doubt your rendering issue is related to malloc though, but might as well try it for good measure.

I'll get the alligned malloc thing fixed tonight and fire off another macOS 10.12 v29.4 build.

I'll also fire off a macOS 11 v29.4 build for the sake of my own testing on my wife's machine, and also I'd be curious to see how it behaves for you compared to the macOS 12 build and the stable homebrew build.

Also, thanks again for all the help and testing here, it's been very valuable to get this all nailed down and stable :)

jimeh commented 2 weeks ago

Two new builds:

shipmints commented 2 weeks ago

The macOS 11 build works much better than the 10.12 build with the 11 build performing similarly to your production 29.4 and does correctly show the colored inner border where the 10.12 build is slower and does not correctly show the border.

jimeh commented 2 weeks ago

That's great to hear. It sounds like targeting macOS 11 is the way forward then if it's as good as macOS 12 targets. We might as well try and support as old macOS versions as we can that doesn't yield any negative impacts :)

Regarding your marginalia and compat issues, I've been unable to reproduce them myself on either macOS 11 (x86_64) or macOS 12 (arm64). Either in my normal config, or in a minimal config where I loaded up the packages in the init.el file.

Do you maybe have a minimal config I could try with?

Assuming your package issues aren't a show stopper, I think the Nix approach is mostly there with a macOS 11 SDK target. What do you think? :)