Closed malte-v closed 1 year ago
i'm able to reproduce this too on aarch64-linux and emacs v29 (running NixOS inside a VM on apple hardware)
this is an issue with nixpkgs, not just emacs-overlay
@malte-v 28.2 compiles fine for me.
@jasonjckn yup, 28.2 compiles, just like v29 without native comp. it's only v29 with native comp that fails. btw, the non-NativeComp
attributes also have native compilation on by default; i manually disabled it through an override.
@malte-v it might be worth posting on nixpkgs, because whatever the issue is, its upstream from emacs-overlay.
have you tried building from source? the issue may even be in emacs repo.
I found out something interesting and don't know why it works.
If you clone the emacs-overlay and run the recommended testing command of:
nix-build --expr 'with (import <nixpkgs> { overlays = [ (import ./.) ]; }); emacsGit'
It will compile fine on aarch64 on m1 for me at commit d54a1521619daa37c9aa8c9e3362abb34e676007.
However if I use emacs-overlay in my system configuration with nix-update.
I get a failure that looks like:
ELC+ELN ../lisp/emacs-lisp/macroexp.elc
Error: wrong-type-argument ("../lisp/emacs-lisp/lisp-mode.el" listp ((call set-match-data #s(comp-mvar nil nil ((0 . 0)) nil 6189160 1) #s(comp-mvar nil (t) nil nil 6189530 2))))
Loading macroexp.elc...
Eager macro-expansion failure: (void-variable unshared)
make[3]: *** [Makefile:282: ../lisp/emacs-lisp/lisp-mode.elc] Error 255
make[2]: *** [Makefile:841: ../lisp/emacs-lisp/lisp-mode.elc] Error 2
make[2]: *** Waiting for unfinished jobs....
I don't have an aarch64 machine, but I think this could be related:
The emacsGit attribute and other natively-compiled master-based attributes in the Emacs overlay have been failing to update for more than a week because commit 97b928ce09d6034ebcb541fb548e5d4862302add in Emacs messed up the old
I don't have an aarch64 machine, but I think this could be related:
The emacsGit attribute and other natively-compiled master-based attributes in the Emacs overlay have been failing to update for more than a week because commit 97b928ce09d6034ebcb541fb548e5d4862302add in Emacs messed up the old
I put a workaround in the overlay around late September or early October, so I would not expect that to be the issue here. I made that PR simply because it makes more sense upstream than it does in the overlay.
I have an m1 machine now with a linux vm of NixOS (until they need it back lol) and decided to try and see what happens if I try building the nixpkgs emacs derivation that emacs-overlay uses but override it with the source to the most recent emacs HEAD. Here is the derivation:
# nixpkgs master 2022-12-02
# with (import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/260de5901e5bb56563528794bda6ff10ddddb80a.tar.gz") {});
# nixpkgs-unstable
with (import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/9063accddd2e68dcc71032459a58399e977374c9.tar.gz") {});
let
emacs = pkgs.emacs.override {
withGTK3 = true;
nativeComp = true;
};
in emacs.overrideAttrs (o: {
src = fetchFromSavannah {
repo = "emacs";
rev = "64044f545add60e045ff16a9891b06f429ac935f";
sha256 = "sha256-95vN7pJ3NaOvXf7x8onv5PZbQCqEdG7buQhpZvIS+Eo=";
};
version = "30.0.50";
})
On both x86-64_linux and aarch64-linux it gives me a patch hunk error:
applying patch /nix/store/x52dw481ybh7r7a5jg6y8zaawf32r89s-native-comp-driver-options-28.patch
patching file lisp/emacs-lisp/comp.el
Hunk #1 FAILED at 178.
1 out of 1 hunk FAILED -- saving rejects to file lisp/emacs-lisp/comp.el.rej
error: builder for '/nix/store/4z93hp769z1jzcbd0knblrj0wmj9dyg5-emacs-30.0.50.drv' failed with exit code 1;
last 8 log lines:
> unpacking sources
> unpacking source archive /nix/store/lpkpci57i3nns9k27v8w3rvk6fx6j75r-source
> source root is source
> patching sources
> applying patch /nix/store/x52dw481ybh7r7a5jg6y8zaawf32r89s-native-comp-driver-options-28.patch
> patching file lisp/emacs-lisp/comp.el
> Hunk #1 FAILED at 178.
> 1 out of 1 hunk FAILED -- saving rejects to file lisp/emacs-lisp/comp.el.rej
For full logs, run 'nix log /nix/store/4z93hp769z1jzcbd0knblrj0wmj9dyg5-emacs-30.0.50.drv'.
This might be a bad test though... there are some patches applied in emacs-overlay that might get past this.
I do still feel like one of the patches in nixpkgs probably breaks things for m1 but perhaps not other systems though.
~@ParetoOptimalDev I think this is because nixpkgs master
and 22.11
currently attempt to apply the exact same patch as what we have in the overlay (awkwardly enough, nixos-unstable
has not been updated to contain that patch). With nixpkgs master
, do you still get the error when you also apply the following to the overlay?~
diff --git a/overlays/emacs.nix b/overlays/emacs.nix
index 634f4c0f..f01678dd 100644
--- a/overlays/emacs.nix
+++ b/overlays/emacs.nix
@@ -31,27 +31,7 @@ let
substituteInPlace lisp/loadup.el \
--replace '(emacs-repository-get-version)' '"${repoMeta.rev}"' \
--replace '(emacs-repository-get-branch)' '"master"'
- '' +
- # XXX: remove when https://github.com/NixOS/nixpkgs/pull/193621 is merged
- (super.lib.optionalString (old ? NATIVE_FULL_AOT)
- (let backendPath = (super.lib.concatStringsSep " "
- (builtins.map (x: ''\"-B${x}\"'') [
- # Paths necessary so the JIT compiler finds its libraries:
- "${super.lib.getLib self.libgccjit}/lib"
- "${super.lib.getLib self.libgccjit}/lib/gcc"
- "${super.lib.getLib self.stdenv.cc.libc}/lib"
-
- # Executable paths necessary for compilation (ld, as):
- "${super.lib.getBin self.stdenv.cc.cc}/bin"
- "${super.lib.getBin self.stdenv.cc.bintools}/bin"
- "${super.lib.getBin self.stdenv.cc.bintools.bintools}/bin"
- ]));
- in ''
- substituteInPlace lisp/emacs-lisp/comp.el --replace \
- "(defcustom comp-libgccjit-reproducer nil" \
- "(setq native-comp-driver-options '(${backendPath}))
-(defcustom comp-libgccjit-reproducer nil"
- ''));
+ '';
}
)
)
--
2.38.1
@ParetoOptimalDev Oh yeah, I also noticed in your test that Nix tries to apply the Emacs 28 patch instead of the other one; my guess is that overriding the attribute version
did not influence the check at https://github.com/nixos/nixpkgs/blob/0114278a9a3a3bb1b026c4edbe503034d7375e07/pkgs/applications/editors/emacs/generic.nix#L72 because version
also appears as a normal input variable in the Emacs expression.
Using overlay with its emacsGitNativeComp
attribute avoids this problem entirely on nixpkgs master
, whether or not the overlay has the patch from my previous post.
@leungbk Thanks for the help.
I had abandoned trying to use the above and just saw your comment with that overlay.
I've been trying to debug this with:
I've determined that the problem happens in the temacs --loadup
step of building emacs aftre reading about the internals of buildling emacs.
My latest errors with the configuration that include using V=1
in the makefile to see full commands are:
make[4]: Entering directory '/build/source/admin/grammars'
[ ! -f "../../lisp/cedet/semantic/bovine/c-by.el" ] || chmod +w "../../lisp/cedet/semantic/bovine/c-by.el"
"../../src/bootstrap-emacs" -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -l semantic/bovine/grammar -f bovine-batch-make-parser -o "../../lisp/cedet/semantic/bovine/c-by.el" c.by
make[3]: *** [Makefile:282: ../lisp/dnd.elc] Segmentation fault (core dumped)
make[3]: Leaving directory '/build/source/lisp'
make[2]: *** [Makefile:841: ../lisp/dnd.elc] Error 2
make[2]: *** Waiting for unfinished jobs....
[ ! -f "../../lisp/cedet/semantic/bovine/make-by.el" ] || chmod +w "../../lisp/cedet/semantic/bovine/make-by.el"
"../../src/bootstrap-emacs" -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)' -l semantic/bovine/grammar -f bovine-batch-make-parser -o "../../lisp/cedet/semantic/bovine/make-by.el" make.by
Error: wrong-type-argument ("../lisp/custom.el" listp (#s(comp-mvar nil (nil) nil nil 6062644 7)))
Backtrace:
I've also attached the full nix log:
How can I go from this failing derivation to being able to run gdb temacs
on the problematic temacs
binary? I think that's what I need to do to debug this further.
Using overlay with its emacsGitNativeComp attribute avoids this problem entirely on nixpkgs master,
I just tried using nixpkgs master and the emacsGitNativeComp
derivation and still got a similar segfault.
Kind of taking shots in the dark here, but wonder if https://github.com/NixOS/nixpkgs/pull/185660 could be related. Going to test that next.
How can I go from this failing derivation to being able to run
gdb temacs
on the problematictemacs
binary? I think that's what I need to do to debug this further.
Try using the --keep-failed
flag. I was able to recover a temacs
from some other failing build when I ran something like nix-build --expr 'with (import ../nixpkgs/. { overlays = [ (import ./.) ]; }); emacsGitNativeComp' --keep-failed
. I never tried running it though, let alone gdb
.
Using overlay with its emacsGitNativeComp attribute avoids this problem entirely on nixpkgs master,
I just tried using nixpkgs master and the
emacsGitNativeComp
derivation and still got a similar segfault.
The problem I was referring to was the patch incompatibility you had previously mentioned, not the segfault.
Thanks for your recommendation,I'll give --keep-failed
a try.
Currently I'm in the middle of a method I came up with after some research/manual reading using nix develop
for this purpose:
cd /path/to/my-system-flake
nix develop .#pkgs.x86_64-linux.emacsGit
cd /tmp/
genericBuild
Then my guess is that /tmp/source will have the temacs in it, but that might also require --keep-failed
.
@ParetoOptimalDev there's 'one-click' install NixOS on M1 + VM available here https://github.com/mitchellh/nixos-config fyi
Thanks for spending time on this :) aarch64-linux doesn't get enough love
Thanks @jasonjckn.
I got emacs 30 building with the flags:
./configure --enable-checking='yes,glyphs' --enable-check-lisp-object-type CFLAGS='-O0 -g3' --with-modules --with-x-toolkit=lucid --with-xft --with-cairo --with-native-compilation
Some differences from the default flags:
--disable-build-details
--with-x-toolkit=lucid
CFLAGS = '-O0 -g3'
(don't think -g3 should matter, but I could imagine -O0
hiding potential aarch64 specific optimization bugs)--enable-check-lisp-object-type
I've been meaning to modify the emacs-overlay emacsGit
to use these flags but haven't been able to get around to it, so at the least I wanted to share my findings above.
To anyone wanting to test something on aarch64-linux my priority list is:
emacsGit
overriden with exact build flags above buildsemacsGit
builds with only difference that it uses lucid instead of gtk3emacsGit
builds with only difference being CFLAGS='-O0'
emacsGit
builds with only difference being we add --enable-check-lisp-object-type
or if it gives better errorsUpdate:
So this segfault seems to happen only when you want X support. This doesn't compile for instance:
emacsGitoverride {
withNS = false;
withX = true; # x with lucid basically
withGTK2 = false;
withGTK3 = false;
withWebP = false;
}
The above conflicts with my finding that I could compile with configure flags including lucid... maybe if I remove the --disable-build-details
it'll make a difference? No idea.
I tested arch linux' aarch64 emacs build, and it worked just fine with GUI, so it's definitely possible https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=emacs-git
going to spend a bit of time on this now... and see if I get anywhere
@ParetoOptimalDev nice find!
"--enable-check-lisp-object-type"
fixes it , the other flags don't matter
(only tested with emacsPgtk so far)
Here's my current working setup if anyone wants, gonna open a PR in a moment as well
overlays (could be simplified) =
[
inputs.emacs-overlay.overlays.default
(final : prev: {
emacsPgtk = (prev.emacsGit.override {
withPgtk = true;
}).overrideAttrs (old : {
name = "emacs-pgtk";
version = inputs.emacs-src.shortRev;
src = inputs.emacs-src;
configureFlags = old.configureFlags ++ [ "--enable-check-lisp-object-type" ];
stdenv = prev.addAttrsToDerivation {
NIX_CFLAGS_COMPILE = "-O3 -pipe -ftree-vectorize -fomit-frame-pointer";
BYTE_COMPILE_EXTRA_FLAGS = '' \
--eval '(setq native-comp-speed 3)' \
--eval '(setq native-comp-compiler-options '("-O3" "-g3"))'
'';
} (prev.impureUseNativeOptimizations final.llvmPackages_14.stdenv);
});
})
]
flake inputs =
emacs-src.url = "github:emacs-mirror/emacs/emacs-29";
emacs-overlay.url = "github:Nix-Community/emacs-overlay";
emacs-src.flake = false;
@jasonjckn Nice! I'll have to try it.
I was curious about why that flag could make things fail.
/* Use the configure flag --enable-check-lisp-object-type to make
Lisp_Object use a struct type instead of the default int. The flag
causes CHECK_LISP_OBJECT_TYPE to be defined. */
https://github.com/emacs-mirror/emacs/blob/master/src/lisp.h
It just
seems that this option is rarely used, so it has bitrotted to some
extent. - Eli Zaretzki
https://lists.gnu.org/archive/html/emacs-devel/2022-04/msg00677.html
So removing this flag should be an okay upstream fix?
I was curious about why that flag could make things fail. So removing this flag should be an okay upstream fix?
Adding that flag makes things succeed (on aarch64-linux), not removing.
As for why? I'm not sure, there's two theories that come to mind
(1) By wrapping Lisp_Word in a struct, this will force memory alignment to 64-bits when that flag is enabled (sizeof(Lisp_Object)), and when it's disabled it might not be (?) hard to tell from source code.
(2) There could be a violation of strict aliasing rule if Lisp_Object is defined as INT , e.g. https://github.com/emacs-mirror/emacs/blob/master/src/lisp.h#L302 not really sure though, i'm not a C programmer.
Upstreaming to emacs-overlay is fine, but adding this flag further upstream... you could try, but given that Arch Linux builds fine without --enable-check-lisp-object-type, hard to say we got to the bottom of the issue, this build worked for me and does not have it https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=emacs-git
Closed via #285.
I'm unable to build Emacs on my M2 MacBook (running Linux) because some program involved in the build process segfaults. It looks like the segfaults are caused by compiler errors on some lisp source files (
Error: wrong-type-argument ("../lisp/button.el" listp (#s(comp-mvar (symbol) nil nil nil 6777278 6)))
), though I don't know any Lisp, so this is a hard one to debug for me :/ Any pointers would be much appreciated!