sagemathinc / cocalc-docker

DEPRECATED (was -- Docker setup for running CoCalc as downloadable software on your own computer)
https://cocalc.com
Other
398 stars 103 forks source link

when doing a fresh install sometimes (often?) it doesn't initially work and you have to stop and start. #111

Closed williamstein closed 3 years ago

williamstein commented 3 years ago

Steps:

  1. Do the standard install step as listed in the README.md
  2. It's not working; the log shows: psql: error: could not connect to server: FATAL: database "smc" does not exist
  3. You do docker stop cocalc, then docker start cocalc, and it works fine

This is of course the sort of problem that could massively reduce usage of cocalc-docker, since it exactly stops a person first trying it out. So this is important to fix.

williamstein commented 3 years ago

Here's a log during startup. This did actually work, but it points to an issue that could potentially have caused problems, and should get fixed:

sql: error: FATAL:  database "smc" does not exist
ERROR:  relation "projects" does not exist
LINE 1: update projects set state='{"state":"opened"}';
               ^
2020-12-24T16:36:09.130Z - debug: ComputeServerClient.constructor: {"dev":false,"single":false,"kubernetes":false,"base_url":""}

2020-12-24T16:36:41.399Z - debug: PostgreSQL.constructor: {"host":"/projects/postgres/data/socket","database":"smc","user":"smc","debug":true,"connect":true,"pool":1,"cache_expiry":5…":300,"concurrent_warn":500,"concurrent_heavily_loaded":70,"ensure_exists":true,"timeout_ms":60000,"timeout_delay_ms":240000}
2020-12-24T16:36:41.400Z - debug: NO PASSWORD FILE!
2020-12-24T16:36:41.400Z - debug: PostgreSQL.connect: "will try to connect"
2020-12-24T16:36:41.400Z - debug: PostgreSQL.connect: "until successful"
2020-12-24T16:36:41.401Z - debug: PostgreSQL.connect: "retry_until_success() -- try 1"
2020-12-24T16:36:41.401Z - debug: PostgreSQL._do_connect: "connect to /projects/postgres/data/socket"
2020-12-24T16:36:41.401Z - debug: PostgreSQL._do_connect: "first make sure db exists"
2020-12-24T16:36:41.401Z - debug: PostgreSQL._ensure_database_exists: "ensure database 'smc' exists"

Basically, we run a query to update the state of all projects on startup... before the database has been created. That leads to an error, which might randomly cause trouble. At least we should put code in to make it clear that this may fail and we're properly catching the error. It's of course fine to have that query fail if the database doesn't exist.

williamstein commented 3 years ago

With the new version (Makefile-personal) that I'm working on, I've not seen this problem once despite doing tons of testing. Probably a false alarm.

bearlike commented 2 years ago

This error still seems to exist. But unlike before, restarting the container is not resolving the issue. Here's the log during the startup.

LOG: start_postgres: starting the server,
LOG: start_hub,
LOG: run pkill -f cocalc-hub-server,
pkill -f cocalc-hub-server,
LOG: run mkdir -p /var/log/hub && cd /cocalc/src/packages/hub && npm run hub-docker-prod > /var/log/hub/out 2>/var/log/hub/err &,
mkdir -p /var/log/hub && cd /cocalc/src/packages/hub && npm run hub-docker-prod > /var/log/hub/out 2>/var/log/hub/err &,
LOG: reset_project_state: ensuring all projects are set as opened (not running) in the database,
LOG: run echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
psql: error: could not connect to server: No such file or directory,
    Is the server running locally and accepting,
    connections on Unix domain socket "/projects/postgres/data/socket/.s.PGSQL.5432"?,
echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
LOG: run echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
psql: error: FATAL:  database "smc" does not exist,
echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
LOG: run echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
psql: error: FATAL:  database "smc" does not exist,
echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
LOG: run echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
ERROR:  relation "projects" does not exist,
LINE 1: update projects set state='{"state":"opened"}';,
               ^,
echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
LOG: waiting for all subprocesses to complete...,
LOG: waiting for all subprocesses to complete...
ufulu commented 2 years ago

Any news on this? I also started experiencing this with a fresh install and no restarts seem to fix it.

szethh commented 1 year ago

I am having the same issue. Rebuilt with a tensorflow/tensorflow:latest-gpu base and i consistently get stuck on

echo "update projects set state='{\"state\":\"opened\"}';" | psql -t,
LOG: waiting for all subprocesses to complete...,
LOG: waiting for all subprocesses to complete...
szethh commented 1 year ago

upon more digging, i've found that this command in run.py fails: cd /cocalc/src/packages/hub && npm run hub-docker-prod.

the log (at ´/var/log/hub/err´), shows:

npm ERR! could not determine executable to run

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2022-11-10T22_16_32_102Z-debug-0.log

cat /root/.npm/_logs/2022-11-10T22_16_32_102Z-debug-0.log:

0 verbose cli /usr/bin/node /usr/lib/node_modules/npm/bin/npm-cli.js
1 info using npm@9.1.1
2 info using node@v16.18.1
3 timing npm:load:whichnode Completed in 0ms
4 timing config:load:defaults Completed in 2ms
5 timing config:load:file:/usr/lib/node_modules/npm/npmrc Completed in 0ms
6 timing config:load:builtin Completed in 0ms
7 timing config:load:cli Completed in 1ms
8 timing config:load:env Completed in 1ms
9 timing config:load:file:/cocalc/src/packages/hub/.npmrc Completed in 1ms
10 timing config:load:project Completed in 3ms
11 timing config:load:file:/root/.npmrc Completed in 0ms
12 timing config:load:user Completed in 0ms
13 timing config:load:file:/usr/etc/npmrc Completed in 0ms
14 timing config:load:global Completed in 0ms
15 timing config:load:setEnvs Completed in 1ms
16 timing config:load Completed in 8ms
17 timing npm:load:configload Completed in 8ms
18 timing npm:load:mkdirpcache Completed in 0ms
19 timing npm:load:mkdirplogs Completed in 0ms
20 verbose title npm exec cocalc-hub-server --mode=multi-user --all --hostname=0.0.0.0 --https-key=/projects/conf/cert/key.pem --https-cert=/projects/conf/cert/cert.pem
21 verbose argv "exec" "--" "cocalc-hub-server" "--mode=multi-user" "--all" "--hostname=0.0.0.0" "--https-key=/projects/conf/cert/key.pem" "--https-cert=/projects/conf/cert/cert.pem"
22 timing npm:load:setTitle Completed in 1ms
23 timing config:load:flatten Completed in 2ms
24 timing npm:load:display Completed in 3ms
25 verbose logfile logs-max:10 dir:/root/.npm/_logs/2022-11-10T22_16_32_102Z-
26 verbose logfile /root/.npm/_logs/2022-11-10T22_16_32_102Z-debug-0.log
27 timing npm:load:logFile Completed in 3ms
28 timing npm:load:timers Completed in 0ms
29 timing npm:load:configScope Completed in 0ms
30 timing npm:load Completed in 15ms
31 silly logfile start cleaning logs, removing 1 files
32 timing config:load:flatten Completed in 0ms
33 silly logfile done cleaning log files
34 timing arborist:ctor Completed in 0ms
35 verbose shrinkwrap failed to load node_modules/.package-lock.json missing from lockfile: node_modules/abab
36 timing command:exec Completed in 3061ms
37 verbose stack Error: could not determine executable to run
37 verbose stack     at getBinFromManifest (/usr/lib/node_modules/npm/node_modules/libnpmexec/lib/get-bin-from-manifest.js:17:23)
37 verbose stack     at exec (/usr/lib/node_modules/npm/node_modules/libnpmexec/lib/index.js:185:15)
37 verbose stack     at async module.exports (/usr/lib/node_modules/npm/lib/cli.js:133:5)
38 verbose pkgid @cocalc/hub@1.108.4
39 verbose cwd /cocalc/src/packages/hub
40 verbose Linux 5.15.0-43-generic
41 verbose node v16.18.1
42 verbose npm  v9.1.1
43 error could not determine executable to run
44 verbose exit 1
45 timing npm Completed in 3091ms
46 verbose code 1
47 error A complete log of this run can be found in:
47 error     /root/.npm/_logs/2022-11-10T22_16_32_102Z-debug-0.log

so it seems to be a node-related issue. more specifically these lines, but I do not know how node/npm work.

35 verbose shrinkwrap failed to load node_modules/.package-lock.json missing from lockfile: node_modules/abab
36 timing command:exec Completed in 3061ms
37 verbose stack Error: could not determine executable to run
bitsnaps commented 11 months ago

I'm getting the same error using Gitpod.