haskell / ghcup-hs

https://www.haskell.org/ghcup/
GNU Lesser General Public License v3.0
287 stars 89 forks source link

Illegal Instruction in GHCUP on x86_64 (Nobara Linux) #1003

Open jackjohn7 opened 8 months ago

jackjohn7 commented 8 months ago

I'm using Nobara Linux which is based on Fedora.

When I execute the curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | sh command listed on the homepage for the website I respond to all the configuration prompts and it seems to install properly. Then when the script goes to execute ghcup, I'm met with this output:

Welcome to Haskell!

This script can download and install the following binaries:
  * ghcup - The Haskell toolchain installer
  * ghc   - The Glasgow Haskell Compiler
  * cabal - The Cabal build tool for managing Haskell software
  * stack - A cross-platform program for developing Haskell projects (similar to cabal)
  * hls   - (optional) A language server for developers to integrate with their editor/IDE

ghcup installs only into the following directory,
which can be removed anytime:
  /home/jack/.ghcup

Press ENTER to proceed or ctrl-c to abort.
Note that this script can be re-run at any given time.

-------------------------------------------------------------------------------

Detected bash shell on your system...
Do you want ghcup to automatically add the required PATH variable to "/home/jack/.bashrc"?

[P] Yes, prepend  [A] Yes, append  [N] No  [?] Help (default is "P").

A
-------------------------------------------------------------------------------
Do you want to install haskell-language-server (HLS)?
HLS is a language-server that provides IDE-like functionality
and can integrate with different editors, such as Vim, Emacs, VS Code, Atom, ...
Also see https://haskell-language-server.readthedocs.io/en/stable/

[Y] Yes  [N] No  [?] Help (default is "N").

Y
-------------------------------------------------------------------------------
Do you want to enable better integration of stack with GHCup?
This means that stack won't install its own GHC versions, but uses GHCup's.
For more information see:
  https://docs.haskellstack.org/en/stable/yaml_configuration/#ghc-installation-customisation-experimental
If you want to keep stacks vanilla behavior, answer 'No'.

[Y] Yes  [N] No  [?] Help (default is "Y").

Y
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15.4M  100 15.4M    0     0  8490k      0  0:00:01  0:00:01 --:--:-- 8488k
[ Info  ] downloading: https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-0.0.8.yaml as file /home/jack/.ghcup/cache/ghcup-0.0.8.yaml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  369k  100  369k    0     0  1438k      0 --:--:-- --:--:-- --:--:-- 1443k
main: line 131: 71130 Illegal instruction     (core dumped) "${GHCUP_BIN}/ghcup" ${args} "$@"
"ghcup --metadata-fetching-mode=Strict upgrade" failed!

After this, I tried installing ghcup through the binaries on the file server linked in the documentation for those who don't like curl | sh. I used the most recent x86_64-linux binary. I placed it in the same location that the installation script does, and I added the location to my path. I get the same error when I attempt to use a command (only --help doesn't fail):

$ ghcup list
[ Info  ] downloading: https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-0.0.8.yaml as file /home/jack/.ghcup/cache/ghcup-0.0.8.yaml
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  369k  100  369k    0     0   538k      0 --:--:-- --:--:-- --:--:--  537k
Illegal instruction (core dumped)

It seems to be running an illegal CPU instruction in any case. I don't see how this could be. I'm using an x86_64 processor (ryzen 7 7700x). Output of lscpu below.

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 7700X 8-Core Processor
    CPU family:          25
    Model:               97
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            2
    CPU(s) scaling MHz:  58%
    CPU max MHz:         5573.0000
    CPU min MHz:         400.0000
    BogoMIPS:            8999.53
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl
                         pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb
                          bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsa
                         ves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vm
                         load vgif x2avic v_spec_ctrl vnmi umip pku ospke rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization features:
  Virtualization:        AMD-V
Caches (sum of all):
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    8 MiB (8 instances)
  L3:                    32 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Vulnerable: Safe RET, no microcode
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

I didn't see another issue quite like this and I couldn't find anyone else with the same issue on Google. I've just verified that the latest installation works fine on my Fedora laptop also using x86_64 (ryzen 5 5500u).

I tried seeing if the esoteric distros section of the installation docs could help, but nothing I tried there worked either. https://www.haskell.org/ghcup/install/#esoteric-distros

I can use GHC, and cabal-install provided by my package manager for the time being, but I thought I should still report this in case someone encounters something similar.

hasufell commented 8 months ago

Interesting, I'll investigate that.

hasufell commented 8 months ago

I have CI self hosted runners that are using AMD Ryzen™ 7 7700. And I definitely cannot reproduce it there.

I have not tried Nobara Linux, but I can't see how that would be relevant.

Are you running under some KVM cloud stuff?

jackjohn7 commented 8 months ago

No cloud stuff. It's just an ordinary desktop I use for development and gaming. I haven't had any similar issues with other toolchains.

hasufell commented 8 months ago

Can you provide the coredump?

jackjohn7 commented 8 months ago

Output of coredumpctl gdb

           PID: 6116 (ghcup)
           UID: 1000 (jack)
           GID: 1000 (jack)
        Signal: 4 (ILL)
     Timestamp: Sat 2024-02-17 00:02:51 CST (18min ago)
  Command Line: ghcup list
    Executable: /home/jack/.ghcup/bin/ghcup
 Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/app-alacritty-83b7a964c25042d99cc6b8b07d91d3a7.scope
          Unit: user@1000.service
     User Unit: app-alacritty-83b7a964c25042d99cc6b8b07d91d3a7.scope
         Slice: user-1000.slice
     Owner UID: 1000 (jack)
       Boot ID: 19b6995a4b854a37986f6783f7a25360
    Machine ID: 88cbced372bf4c199ac9a3e7ffeffceb
      Hostname: nobara-pc
       Storage: /var/lib/systemd/coredump/core.ghcup.1000.19b6995a4b854a37986f6783f7a25360.6116.1708149771000000.zst (present)
  Size on Disk: 1.5M
       Message: Process 6116 (ghcup) of user 1000 dumped core.

                Module /home/jack/.ghcup/bin/ghcup without build-id.
                Stack trace of thread 6116:
                #0  0x0000000000f14e1a n/a (/home/jack/.ghcup/bin/ghcup + 0xb14e1a)
                ELF object binary architecture: AMD x86-64

GNU gdb (Fedora Linux) 14.1-4.fc39
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/jack/.ghcup/bin/ghcup...
(No debugging symbols found in /home/jack/.ghcup/bin/ghcup)
[New LWP 6116]
[New LWP 6118]
[New LWP 6117]
[New LWP 6120]
[New LWP 6119]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for system-supplied DSO at 0x7ffee4da0000
Core was generated by `ghcup list'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x0000000000f14e1a in ?? ()
[Current thread is 1 (LWP 6116)]

I have a massive dump file (8000+ lines) as well. Would that be useful?

hasufell commented 8 months ago

Yes

hasufell commented 8 months ago

CCing @bgamari in case this might be interesting

jackjohn7 commented 8 months ago

I've included a google drive link to the file since it's too large to be attached here https://drive.google.com/file/d/1IkbqgBa19s33RvzReV1J1U7S3jzzCWKf/view?usp=sharing

runeksvendsen commented 8 months ago

I've included a google drive link to the file since it's too large to be attached here https://drive.google.com/file/d/1IkbqgBa19s33RvzReV1J1U7S3jzzCWKf/view?usp=sharing

@jackjohn7 you can attach it here if you zip it: core_dump.zip

bgamari commented 8 months ago

Very odd. Indeed it appears the executable jumped into the middle of an abyss:

>>> x/8i $pc
=> 0xf14e1a:    add    %al,(%rax)
   0xf14e1c:    add    %al,(%rax)
   0xf14e1e:    add    %al,(%rax)

Even stranger, the Haskell stack register is complete nonsense.

>>> print $rbp
$1 = (void *) 0x12

Something has gone horribly wrong in this program.

I have tried to reproduce this locally with Nobara 39 running under a VM on a Ryzen 5900X to no avail.

bgamari commented 8 months ago

@jackjohn7, a few questions:

jackjohn7 commented 8 months ago
jackjohn7 commented 7 months ago

Update

AVX512 was disabled for my CPU. I carelessly disabled this feature for playing a particular game. Re-enabling it seems to have fixed the issue entirely. That or updating my system may have effected it. In any case, the toolchain is now working for me.

I see this got tagged as a bug. Was this reproduced for anyone else?