sagemathinc / cocalc-docker

DEPRECATED (was -- Docker setup for running CoCalc as downloadable software on your own computer)
https://cocalc.com
Other
398 stars 103 forks source link

 ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL on almost any interaction #211

Open charlesangus opened 7 months ago

charlesangus commented 7 months ago

Hi there, setting up cocalc-docker on a new install with docker-compose, and while it boots up and I can access the webpage, clicking anything almost always results in this (from the docker-compose output:

cocalc         | ==> /var/log/hub/out <==
cocalc         | undefined
cocalc         |  ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL  Command was killed with SIGILL (Invalid machine instruction): cocalc-hub-server --mode=multi-user --all --hostname=0.0.0.0
cocalc         |  ELIFECYCLE  Command failed with exit code 1.
cocalc         | LOG: Started services.

Versions used:

Host machine: Ubuntu 22.04
Docker version 24.0.5, build 24.0.5-0ubuntu1~22.04.1
docker-compose-v2 version 2.20.2+ds1-0ubuntu1~22.04.1
sagemathinc/cocalc-docker

Some googling turned up a couple posts with potentially similar issues in other software, seems to indicate it could be a permissions issue (https://github.com/cypress-io/cypress/issues/23962, https://github.com/gatsbyjs/gatsby/issues/22622)? Maybe a red herring.

I tried running docker-compose with sudo (normally I run it with a user in the docker group without sudo) in case there was some kind of permissions thing, same result.

williamstein commented 7 months ago

SIGILL (Invalid machine instruction)

That sounds like an incompatibility in the processor. Exactly what hardware are you running on? You might need to build cocalc-docker from source, or I need to build some "legacy" version.

charlesangus commented 7 months ago

This can be reliably reproduced by:

Interestingly, this does not (immediately) produce the error:

charlesangus commented 7 months ago

It's an old processor for sure - Mac Pro 1,1. Still creaking along!

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         36 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
    CPU family:          6
    Model:               15
    Thread(s) per core:  1
    Core(s) per socket:  2
    Socket(s):           2
    Stepping:            6
    BogoMIPS:            5319.92
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mm
                         x fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cp
                         uid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm pti t
                         pr_shadow dtherm

But it's weird that it can run if I can get past the homepage, I would think if it was the processor being incompatible nothing would work.

williamstein commented 7 months ago

But it's weird that it can run if I can get past the homepage, I would think if it was the processor being incompatible nothing would work.

Some parts of the install are compiled, and some are interpreted python and javascript. Some part that is compiled is causing the trouble. In any case, this is definitely the problem.

I'll try making an alternative build on an older x86, and will post when that is available (it can take a bit).

williamstein commented 7 months ago

@charlesangus Please try running this image instead:

sagemathinc/cocalc-docker-x86_64:2023-12-02-skylake 

If it works, let me know. It's built using an Intel Skylake machine, and that arch came out in maybe 2015, so might work for you. If it works, I'll add some instructions or somehow make this more well supported.

NOTE -- I'm pushing it right as a I write this and it will take a few minutes to upload!

charlesangus commented 7 months ago

Thanks for that. Unfortunately, same behaviour (although slightly more in the log).

If I wanted to compile it on my machine, would that just involve cloning the repo and building the Dockerfile?

Also just on the offchance, I tried moving the projects folder from a ZFS to a regular drive, no luck, and played around with the NGINX reverse-proxy config a bit, also without avail.

cocalc       |   
cocalc       | ==> /var/log/hub/out <== 
cocalc       | undefined
cocalc       |  ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL  Command was killed with SIGILL (Invalid machine instruction): cocalc-hub-server --mode=multi-user --all --hostname=0.0.0.0
cocalc       |  ELIFECYCLE  Command failed with exit code 1.
cocalc       |   
cocalc       | ==> /var/log/hub/log <== 
cocalc       | 2023-12-03T04:43:25.111Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.113Z: cocalc:http:next:init req.url=/_next/static/chunks/main-e49d0c2cba6fc65b.js
cocalc       | 2023-12-03T04:43:25.140Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.141Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.143Z: cocalc:http:next:init req.url=/_next/static/chunks/pages/_app-32e8c564306b2787.js
cocalc       | 2023-12-03T04:43:25.144Z: cocalc:http:next:init req.url=/_next/static/chunks/4295-b0d84a3ea161b503.js
cocalc       | 2023-12-03T04:43:25.159Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.160Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.161Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.162Z: cocalc:http:next:init req.url=/_next/static/chunks/9152-70e1d53094bcb590.js
cocalc       | 2023-12-03T04:43:25.164Z: cocalc:http:next:init req.url=/_next/static/chunks/131-5c84122e78e4b36c.js
cocalc       | 2023-12-03T04:43:25.165Z: cocalc:http:next:init req.url=/_next/static/chunks/5882-2c091805bcacab95.js
cocalc       | 2023-12-03T04:43:25.171Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.173Z: cocalc:http:next:init req.url=/_next/static/chunks/1661-9b8ac4e6c8dcc947.js
cocalc       | 2023-12-03T04:43:25.188Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.189Z: cocalc:http:next:init req.url=/_next/static/chunks/3081-37b75803c27fb499.js
cocalc       | 2023-12-03T04:43:25.194Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.196Z: cocalc:http:next:init req.url=/_next/static/chunks/4824-ab3a740220da3667.js
cocalc       | 2023-12-03T04:43:25.205Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.207Z: cocalc:http:next:init req.url=/_next/static/chunks/4226-d8db29791ae32ed3.js
cocalc       | 2023-12-03T04:43:25.220Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.221Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.222Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.223Z: cocalc:http:next:init req.url=/_next/static/chunks/9503-8d96e40dc54b4f00.js
cocalc       | 2023-12-03T04:43:25.225Z: cocalc:http:next:init req.url=/_next/static/chunks/5308-d4003583037d6fb7.js
cocalc       | 2023-12-03T04:43:25.226Z: cocalc:http:next:init req.url=/_next/static/chunks/9261-abb10b982aec31eb.js
cocalc       | 2023-12-03T04:43:25.229Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.230Z: cocalc:http:next:init req.url=/_next/static/chunks/8926-97c97bd2e98b2462.js
cocalc       | 2023-12-03T04:43:25.254Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.256Z: cocalc:http:next:init req.url=/_next/static/chunks/4777-d64e9dffc1393852.js
cocalc       | 2023-12-03T04:43:25.259Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.260Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.261Z: cocalc:http:next:init req.url=/_next/static/chunks/8312-41f59cf40df0d23d.js
cocalc       | 2023-12-03T04:43:25.263Z: cocalc:http:next:init req.url=/_next/static/chunks/5504-c3a6360105133043.js
cocalc       | 2023-12-03T04:43:25.266Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.267Z: cocalc:http:next:init req.url=/_next/static/chunks/7630-18ea04e4e3eb6a1d.js
cocalc       | 2023-12-03T04:43:25.434Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.435Z: cocalc:http:next:init req.url=/_next/static/chunks/8820-f1e6e5a3770236ab.js
cocalc       | 2023-12-03T04:43:25.459Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.460Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.461Z: cocalc:http:next:init req.url=/_next/static/chunks/8243-7387d7c84a21d9e0.js
cocalc       | 2023-12-03T04:43:25.463Z: cocalc:http:next:init req.url=/_next/static/chunks/pages/index-4a198a0fa8b817c2.js
cocalc       | 2023-12-03T04:43:25.466Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.467Z: cocalc:http:next:init req.url=/_next/static/a-BeELeb8l1NBG2XzBNJX/_buildManifest.js
cocalc       | 2023-12-03T04:43:25.481Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 2023-12-03T04:43:25.482Z: cocalc:http:next:init req.url=/_next/static/a-BeELeb8l1NBG2XzBNJX/_ssgManifest.js
cocalc       | 2023-12-03T04:43:25.493Z: cocalc:debug:virtual-hosts checking for vhost <<cocalc.mydomain.com>>
cocalc       | 
cocalc       | LOG: Started services.
cocalc       | 
cocalc       | ==> /var/log/hub/out <==
cocalc       |  ELIFECYCLE  Command failed with exit code 1.
cocalc       | 
williamstein commented 7 months ago

On Sat, Dec 2, 2023 at 9:20 PM charlesangus @.***> wrote:

Thanks for that. Unfortunately, same behaviour (although slightly more in the log).

Thanks for testing.

If I wanted to compile it on my machine, would that just involve cloning the repo and building the Dockerfile?

Yes. Hopefully just

git clone https://github.com/sagemathinc/cocalc-docker cd cocalc-docker make cocalc-docker

and then you'll have an image cocalc-docker-x86_64 that you can run.

-- William

Also just on the offchance, I tried moving the projects folder from a ZFS to a regular drive, no luck, and played around with the NGINX reverse-proxy config a bit, also without avail.

I mean "Invalid machine instruction" means that the code uses things that just aren't available on your processor.

cocalc | cocalc | ==> /var/log/hub/out <== cocalc | undefined cocalc |  ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL  Command was killed with SIGILL (Invalid machine instruction): cocalc-hub-server --mode=multi-user --all --hostname=0.0.0.0 cocalc |  ELIFECYCLE  Command failed with exit code 1. cocalc | cocalc | ==> /var/log/hub/log <== cocalc | 2023-12-03T04:43:25.111Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.113Z: cocalc:http:next:init req.url=/_next/static/chunks/main-e49d0c2cba6fc65b.js cocalc | 2023-12-03T04:43:25.140Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.141Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.143Z: cocalc:http:next:init req.url=/_next/static/chunks/pages/_app-32e8c564306b2787.js cocalc | 2023-12-03T04:43:25.144Z: cocalc:http:next:init req.url=/_next/static/chunks/4295-b0d84a3ea161b503.js cocalc | 2023-12-03T04:43:25.159Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.160Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.161Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.162Z: cocalc:http:next:init req.url=/_next/static/chunks/9152-70e1d53094bcb590.js cocalc | 2023-12-03T04:43:25.164Z: cocalc:http:next:init req.url=/_next/static/chunks/131-5c84122e78e4b36c.js cocalc | 2023-12-03T04:43:25.165Z: cocalc:http:next:init req.url=/_next/static/chunks/5882-2c091805bcacab95.js cocalc | 2023-12-03T04:43:25.171Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.173Z: cocalc:http:next:init req.url=/_next/static/chunks/1661-9b8ac4e6c8dcc947.js cocalc | 2023-12-03T04:43:25.188Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.189Z: cocalc:http:next:init req.url=/_next/static/chunks/3081-37b75803c27fb499.js cocalc | 2023-12-03T04:43:25.194Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.196Z: cocalc:http:next:init req.url=/_next/static/chunks/4824-ab3a740220da3667.js cocalc | 2023-12-03T04:43:25.205Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.207Z: cocalc:http:next:init req.url=/_next/static/chunks/4226-d8db29791ae32ed3.js cocalc | 2023-12-03T04:43:25.220Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.221Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.222Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.223Z: cocalc:http:next:init req.url=/_next/static/chunks/9503-8d96e40dc54b4f00.js cocalc | 2023-12-03T04:43:25.225Z: cocalc:http:next:init req.url=/_next/static/chunks/5308-d4003583037d6fb7.js cocalc | 2023-12-03T04:43:25.226Z: cocalc:http:next:init req.url=/_next/static/chunks/9261-abb10b982aec31eb.js cocalc | 2023-12-03T04:43:25.229Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.230Z: cocalc:http:next:init req.url=/_next/static/chunks/8926-97c97bd2e98b2462.js cocalc | 2023-12-03T04:43:25.254Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.256Z: cocalc:http:next:init req.url=/_next/static/chunks/4777-d64e9dffc1393852.js cocalc | 2023-12-03T04:43:25.259Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.260Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.261Z: cocalc:http:next:init req.url=/_next/static/chunks/8312-41f59cf40df0d23d.js cocalc | 2023-12-03T04:43:25.263Z: cocalc:http:next:init req.url=/_next/static/chunks/5504-c3a6360105133043.js cocalc | 2023-12-03T04:43:25.266Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.267Z: cocalc:http:next:init req.url=/_next/static/chunks/7630-18ea04e4e3eb6a1d.js cocalc | 2023-12-03T04:43:25.434Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.435Z: cocalc:http:next:init req.url=/_next/static/chunks/8820-f1e6e5a3770236ab.js cocalc | 2023-12-03T04:43:25.459Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.460Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.461Z: cocalc:http:next:init req.url=/_next/static/chunks/8243-7387d7c84a21d9e0.js cocalc | 2023-12-03T04:43:25.463Z: cocalc:http:next:init req.url=/_next/static/chunks/pages/index-4a198a0fa8b817c2.js cocalc | 2023-12-03T04:43:25.466Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.467Z: cocalc:http:next:init req.url=/_next/static/a-BeELeb8l1NBG2XzBNJX/_buildManifest.js cocalc | 2023-12-03T04:43:25.481Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | 2023-12-03T04:43:25.482Z: cocalc:http:next:init req.url=/_next/static/a-BeELeb8l1NBG2XzBNJX/_ssgManifest.js cocalc | 2023-12-03T04:43:25.493Z: cocalc:debug:virtual-hosts checking for vhost <> cocalc | cocalc | LOG: Started services. cocalc | cocalc | ==> /var/log/hub/out <== cocalc |  ELIFECYCLE  Command failed with exit code 1. cocalc |

— Reply to this email directly, view it on GitHub https://github.com/sagemathinc/cocalc-docker/issues/211#issuecomment-1837373575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJXS5WP7YESV3OPR24E5YLYHQD23AVCNFSM6AAAAABAECSXCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZXGM3TGNJXGU . You are receiving this because you commented.Message ID: @.***>

-- William (http://wstein.org)

charlesangus commented 7 months ago

So seems like it's upstream in Sage - make cocalc-docker eventually chokes with:

... <many more lines> ...
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(PyEval_EvalCode+0xa8)[0x7fada1c0f928]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(+0x26c563)[0x7fada1c50563]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(+0x26c7f7)[0x7fada1c507f7]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(+0x26c8df)[0x7fada1c508df]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(_PyRun_SimpleFileObject+0x12c)[0x7fada1c5329c]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(_PyRun_AnyFileObject+0x43)[0x7fada1c53763]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(Py_RunMain+0x753)[0x7fada1c70a23]
/usr/local/sage/local/var/lib/sage/venv-python3.11.1/lib/libpython3.11.so.1.0(Py_BytesMain+0x5e)[0x7fada1c70f7e]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fada17d8d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fada17d8e40]
python3(_start+0x25)[0x55a3d0513095]
------------------------------------------------------------------------
Attaching gdb to process id 81.
Cannot find gdb installed
GDB is not installed.
Install gdb for enhanced tracebacks.
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
/usr/local/sage/src/bin/sage-python: line 2:    81 Illegal instruction     (core dumped) sage -python "$@"

From poking around, it seems like this is a known issue that should be fixed by building Sage with the seemingly not-super-well-named SAGE_FAT_BINARY=yes which it looks like is being done for cocalc, but maybe I'm mistaken.

I'm seeing some potentially-related Sage tickets (such as this one from you):

https://github.com/sagemath/sage/issues/33863

williamstein commented 7 months ago

OK, that makes sense. Of course I am building Sage with SAGE_FAT_BINARY=yes, but it's buggy and doesn't work well enough, evidently.

You could build cocalc-docker from a month or two ago (before I put the sage build in a separate docker image) or just comment out the parts that involve sage, and install sage some other way.

jochym commented 4 months ago

I have the same issue on the Opterons. Would you point us to the way to build the image for other CPUs ? Switching off SAGE_FAT_BINARY ? Any way to resolve this?

charlesangus commented 4 months ago

I tried using an old version of cocalc-docker (before Sage was packaged up into its own Docker) and building everything myself, but ran into other issues. Possibly the simplest would be to release the Dockerfile for the Sage that's included in CoCalc, so people can build it themselves? If that works, would be fairly easy for folks to keep their own version of that around to use when building their own cocalc-docker image.

williamstein commented 4 months ago

Possibly the simplest would be to release the Dockerfile for the Sage that's included in CoCalc, so people can build it themselves?

It's all here -- https://github.com/sagemathinc/cocalc-compute-docker

You could do

make sagemath-core

and that builds the Docker image locally. Then you change the line

FROM sagemathinc/sagemath-core${ARCH}:${SAGEMATH_TAG} as sagemath

in the local Dockerfile (here in this repo) to so that sagemathinc/sagemath-core${ARCH}:${SAGEMATH_TAG} is replaced by the local image you just built above. Then type

make cocalc-docker

in this repo.

Does that help?

jochym commented 4 months ago

I am building the whole stack right now. I will report on the results. OTOH is there a simple way to change the compilation options for the local build? E.g. switch off the fat-binary.

charlesangus commented 4 months ago

As I understand it, the poorly-named fat binary setting is supposed to make a "portable" build that doesn't rely on particular instructions present on the build system, but it is not totally effective. My guess is it hasn't been updated in a while and there are some newer instructions which are not specifically forbidden, and so are slipping in, but that's largely speculation.

If you build on the machine you intend to run on, my understanding is it should only use instructions present on that processor, so it should be irrelevant whether the fat binary setting is on or off. It's mostly just for if you're building on a new processor but want to run on an old one.

williamstein commented 4 months ago

Yes, that's what a "fat binary" is - https://en.wikipedia.org/wiki/Fat_binary

It's always a tradeoff about how fat you make it, and what compilers and vendors support.

Switching off fat binary support may make your binary thinner (smaller), which could be good for you. You could delete this line in your copy of cocalc-compute-docker to turn it off:

https://github.com/sagemathinc/cocalc-compute-docker/blob/main/src/sagemath/scripts/build-sage.sh#L5

jochym commented 4 months ago

Unfortunately I have a problem with local build. The make sagemath-core run successfully and produced the binary which executed fine. The build of cocalc crashed with the following error:

2100.6 pnpm run build # in '/cocalc/src/packages/next'
2100.6 Traceback (most recent call last):
2100.6   File "/cocalc/src/./workspaces.py", line 480, in <module>
2100.6     main()
2100.6   File "/cocalc/src/./workspaces.py", line 476, in main
2100.6     args.func(args)
2100.6   File "/cocalc/src/./workspaces.py", line 280, in build
2100.6     thread_map(f, v, 1)
2100.6   File "/cocalc/src/./workspaces.py", line 96, in thread_map
2100.6     return [callable(x) for x in inputs]
2100.6   File "/cocalc/src/./workspaces.py", line 96, in <listcomp>
2100.6     return [callable(x) for x in inputs]
2100.6   File "/cocalc/src/./workspaces.py", line 271, in f
2100.6     cmd("pnpm run build", package_path)
2100.6   File "/cocalc/src/./workspaces.py", line 72, in cmd
2100.6     raise RuntimeError(msg)
2100.6 RuntimeError: Error executing 'pnpm run build'
2100.7  ELIFECYCLE  Command failed with exit code 1.
2100.7  WARN   Local package.json exists, but node_modules missing, did you mean to install?

Is there something missing in my environment? What can I do to solve this?

williamstein commented 4 months ago

Is there any way to see the actually error? It would be potentially pages earlier in the log output.

jochym commented 4 months ago

It is a bit difficult to spot, so I am attaching the log of the build from the last successful stage to the final error message. Any help will be appreciated - since it seems we are close to making it work. builder.log

williamstein commented 4 months ago

Yes, that is nearly the end of building cocalc. There's no info I can see in that log except "it didn't work". Maybe it ran out of RAM? How much memory is available?

jochym commented 4 months ago

Maybe. it was 20GB without swap. I have added 32GB swap, we will see what will happen.

williamstein commented 4 months ago

20GB should have been more than sufficient. Sorry that I don't have any other ideas.

What's your cat /proc/cpuinfo like?

Also if you have plenty of disk space and can install a vanilla Ubuntu 22.04 VM on your computer, you could easily connect it to cocalc.com as a "compute server". Then I could try building cocalc-docker on it. Basically that VM connects to cocalc.com via a websocket, and cocalc provides collaborative terminal on the VM. If I can build cocalc-docker there, then you could use it, and we could also push it to dockerhub as a fallback older image. This only makes sense if you have plenty of disk space sitting around and don't mind installing a virtual machine (using virtualbox or whatever).

jochym commented 4 months ago

You are right, it barely touched the swap. This is KVM running on debian 12 with 8 cpus:

processor   : 7
vendor_id   : AuthenticAMD
cpu family  : 15
model       : 6
model name  : Common KVM processor
stepping    : 1
microcode   : 0x1000065
cpu MHz     : 2200.088
cache size  : 512 KB
physical id : 1
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 7
initial apicid  : 7
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl cpuid extd_apicid tsc_known_freq pni cx16 x2apic hypervisor cmp_legacy 3dnowprefetch vmmcall
bugs        : fxsave_leak sysret_ss_attrs null_seg swapgs_fence spectre_v1 spectre_v2
bogomips    : 4400.17
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Can I enable some more debugging for the build? What sort of stage the error is happening at? The system is not accessible from the outside, unfortunately.

williamstein commented 4 months ago

Can I enable some more debugging for the build?

I can't think of anything further

What sort of stage the error is happening at?

It's pnpm running nodejs to install npm packages then doing a production build of the nextjs frontend webapp. If anything goes wrong with installing a package or versions or whatever, then the end of the error looks like your log file, and a page or two up there is always a message printed with the actual error. I've of course seen a million such failures before, and if you click on any failed CI run here you'll see one: https://github.com/sagemathinc/cocalc/actions

I've never ever seen one with no error, and that's what you've got. To me that means that somehow the OS just killed some subprocess preventing an error from being reported.

Sorry that I can help more.