Open Sticcia opened 1 year ago
@Sticcia - sorry to hear this is happening.
Have you deployed the server via Kubernetes or Docker Compose?
What version of the Docker Compose or Kubernetes Helm Chart have you deployed, and what version is the Speckle Server image that is running?
How frequently is the 500 error occurring, is it at a regular frequency, or only when certain actions are performed?
Have you tried running the server with debug logging level? (either set the LOG_LEVEL
environment variable in the Docker Compose file, or set the .server.logLevel
value in the Helm Chart to debug
)
We deployed the server using Docker Compose. Docker Compose version: 1.25.0 Speckle Server version: 2.13.3
It happens once a week on average, our previous deployment was on Digital Ocean and the issue persisted after the migration.
I don't see any relations to actions taken by us but we integrated the viewer in our web application which receives requests from external users (you can see in the log requests through app.stykka.com
and dev-velux.web.app
).
I have enabled debug logging now, will come back with an updated log file next time this happens.
Thanks for the support.
Here is the error (GraphQLError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?) which probably triggered the last time server container was hanging. The full log is attached. Thanks log.txt
Hi @saschabul
Thanks for sending the logs.
Knex is the module used to connect to the Postgres database. This error may indicate a number of things; it may be that it temporarily lost connection to the Postgres database, or that more connections were requested than there were available. It may also indicate other issues, such as the server not freeing up connections quickly enough.
Do you know if this failure occurred during any particular operation, for example sending a particular model from a Speckle connector?
Do you have a monitoring system in place to display the prometheus-format monitoring data that Speckle server generates? You may be able to see how the number of database connections change over time, and whether this correlates with higher demand on Speckle server.
Hi @iainsproat
No, we don't see any relation with actions taken by users. No, we don't have a monitoring system, all we have is an up-time check that will notify us if the Speckle server is not responding. We will consider setting that up, thanks.
@Sticcia @saschabul
I've created a Pull Request with a change which will include a health check for Speckle Server when run via Docker Compose. This health check with repeatedly poll Speckle Server, and if it stops responding for approximately 30 seconds will automatically restart it.
Could you try out the change in your environment and let me know if this works for you? https://github.com/specklesystems/speckle-server/pull/1651/files
Unfortunately this doesn't address the root cause, but will help reduce the amount of downtime and need for manual intervention to restart it.
@iainsproat
I updated the docker compose configuration following your changes. (I was not able to start the container with a healthcheck without updating the version to 2.4) This is our current docker-compose.yml:
version: '2.4'
services:
####
# Speckle Server dependencies
#######
postgres:
image: 'postgres:14.5-alpine'
restart: always
environment:
POSTGRES_DB: speckle
POSTGRES_USER: speckle
POSTGRES_PASSWORD: speckle
volumes:
- ./postgres-data:/var/lib/postgresql/data/
ports:
- '127.0.0.1:5432:5432'
redis:
image: 'redis:7.0-alpine'
restart: always
volumes:
- ./redis-data:/data
ports:
- '127.0.0.1:6379:6379'
minio:
image: 'minio/minio'
command: server /data --console-address ":9001"
restart: always
volumes:
- ./minio-data:/data
ports:
- '127.0.0.1:9000:9000'
- '127.0.0.1:9001:9001'
####
# Speckle Server
#######
speckle-frontend:
image: speckle/speckle-frontend:2.13.3
restart: always
ports:
- '127.0.0.1:8080:8080'
environment:
FILE_SIZE_LIMIT_MB: 100
speckle-server:
image: speckle/speckle-server:2.13.3
restart: always
healthcheck:
test:
- CMD
- node
- -e
- "require('node:http').request({headers: {'Content-Type': 'application/json'}, port:3000, hostname:'127.0.0.1', path:'/graphql?query={serverInfo{version}}', method: 'GET' }, (res) => { body = ''; res.on('data', (chunk) => {body += chunk;}); res.on('end', () => {process.exit(res.statusCode != 200 || body.toLowerCase().includes('error'));}); }).end();"
interval: 10s
timeout: 10s
retries: 3
start_period: 5s
ports:
- '127.0.0.1:3000:3000'
command: ['bash', '-c', '/wait && node bin/www']
environment:
CANONICAL_URL: "https://geo.stykka.com"
SESSION_SECRET: "***"
STRATEGY_LOCAL: 'true'
LOG_LEVEL: 'debug'
POSTGRES_URL: 'postgres'
POSTGRES_USER: 'speckle'
POSTGRES_PASSWORD: 'speckle'
POSTGRES_DB: 'speckle'
REDIS_URL: 'redis://redis'
WAIT_HOSTS: 'postgres:5432, redis:6379, minio:9000'
EMAIL: "true"
EMAIL_HOST: "smtp.sendgrid.net"
EMAIL_PORT: "465"
EMAIL_USERNAME: "apikey"
EMAIL_PASSWORD: "***"
EMAIL_FROM: "***"
EMAIL_SECURE: 'true'
S3_ENDPOINT: 'http://minio:9000'
S3_ACCESS_KEY: 'minioadmin'
S3_SECRET_KEY: 'minioadmin'
S3_BUCKET: 'speckle-server'
S3_CREATE_BUCKET: 'true'
S3_REGION: '' # optional, defaults to 'us-east-1'
FILE_SIZE_LIMIT_MB: 500
SENTRY_DSN: "***"
speckle-preview-service:
image: speckle/speckle-preview-service:2.13.3
restart: always
mem_limit: '1000m'
memswap_limit: '1000m'
command: ['bash', '-c', '/wait && node bin/www']
environment:
LOG_LEVEL: 'info'
PG_CONNECTION_STRING: 'postgres://speckle:speckle@postgres/speckle'
WAIT_HOSTS: 'postgres:5432'
speckle-webhook-service:
image: speckle/speckle-webhook-service:2.13.3
restart: always
command: ['bash', '-c', '/wait && node main.js']
environment:
LOG_LEVEL: 'info'
PG_CONNECTION_STRING: 'postgres://speckle:speckle@postgres/speckle'
WAIT_HOSTS: 'postgres:5432'
fileimport-service:
image: speckle/speckle-fileimport-service:2.13.3
restart: always
command: ['bash', '-c', '/wait && node src/daemon.js']
environment:
LOG_LEVEL: 'info'
PG_CONNECTION_STRING: 'postgres://speckle:speckle@postgres/speckle'
WAIT_HOSTS: 'postgres:5432'
SPECKLE_SERVER_URL: 'http://speckle-server:3000'
@Sticcia - thanks for looking at it. After you updated the docker compose file version, did the server start and work as normal?
@iainsproat - yes, the server is running as normal with the configuration above (I just now did a small edit). We will let you know if the server crashes again. Thanks for the support.
@iainsproat - we had a crash again Friday with the same error pasted by @saschabul above. We set up an up-time check with the same settings you shared in the yaml configuration. The check notified us of the failure but the container did not automatically restart.
Also, I tried to look into the logs with docker-compose logs -f --tail=50 speckle-server
but there is nothing logged at the time of the failure (we restarted the service just after the crash, thus the Attaching to speckle_speckle-server_1
line)
[36mspeckle-server_1 |[0m {"level":"info","time":"2023-06-23T08:41:13.272Z","req":{"id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","method":"POST","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","content-length":"213","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36","accept":"*/*","referer":"https://geo.stykka.com/streams/f5ad2b244d?token=9d6313e5756d9dbdd253eff4747f90469f4d56ca205bb8483a","accept-encoding":"gzip, deflate, br","accept-language":"da-DK,da;q=0.9,en-US;q=0.8,en;q=0.7","apollographql-client-name":"web","apollographql-client-version":"unknown","content-type":"application/json","origin":"https://geo.stykka.com","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","x-request-id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","x-speckle-client-ip":"194.182.232.255"}},"authContext":{"auth":false},"req":{"id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","method":"POST","headers":{"host":"geo.stykka.com","connection":"upgrade","content-length":"213","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36","accept":"*/*","referer":"https://geo.stykka.com/streams/f5ad2b244d?token=9d6313e5756d9dbdd253eff4747f90469f4d56ca205bb8483a","accept-encoding":"gzip, deflate, br","accept-language":"da-DK,da;q=0.9,en-US;q=0.8,en;q=0.7","apollographql-client-name":"web","apollographql-client-version":"unknown","content-type":"application/json","origin":"https://geo.stykka.com","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","x-request-id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","x-speckle-client-ip":"194.182.232.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"query EmailVerificationBannerState {\n activeUser {\n id\n email\n verified\n hasPendingVerification\n __typename\n }\n}\n","graphql_variables":{},"graphql_operation_value":"GQL query activeUser","grqphql_operation_name":"GQL activeUser","actionName":"query activeUser","msg":"graphql response"}
[36mspeckle-server_1 |[0m {"level":"info","time":"2023-06-23T08:41:13.274Z","req":{"id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","method":"POST","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","content-length":"213","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36","accept":"*/*","referer":"https://geo.stykka.com/streams/f5ad2b244d?token=9d6313e5756d9dbdd253eff4747f90469f4d56ca205bb8483a","accept-encoding":"gzip, deflate, br","accept-language":"da-DK,da;q=0.9,en-US;q=0.8,en;q=0.7","apollographql-client-name":"web","apollographql-client-version":"unknown","content-type":"application/json","origin":"https://geo.stykka.com","sec-fetch-site":"same-origin","sec-fetch-mode":"cors","sec-fetch-dest":"empty","x-request-id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","x-speckle-client-ip":"194.182.232.255"}},"res":{"statusCode":200,"headers":{"x-request-id":"504f4d9e-a8c6-429b-84b5-3c2e74665a71","x-ratelimit-remaining":44,"access-control-allow-origin":"*","content-type":"application/json; charset=utf-8","content-length":"29","etag":"W/\"1d-soSuydrEK1Bd58NMwBp5Y4Wv7p8\""}},"responseTime":8,"msg":"request completed"}
Attaching to speckle_speckle-server_1
[36mspeckle-server_1 |[0m {"level":"info","time":"2023-07-03T08:22:24.332Z","req":{"id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb"}},"authContext":{"auth":false},"req":{"id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb","method":"GET","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
[36mspeckle-server_1 |[0m {"level":"info","time":"2023-07-03T08:22:24.338Z","req":{"id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb"}},"res":{"statusCode":200,"headers":{"x-request-id":"2c4b0b27-2239-4a38-a72e-b4ea083448bb","x-ratelimit-remaining":499,"access-control-allow-origin":"*","content-type":"application/json; charset=utf-8","content-length":"45","etag":"W/\"2d-lgbNWZvtrzDaO7tQCG8eXovuMp8\""}},"responseTime":10,"msg":"request completed"}
Hi @iainsproat the issue is happening more often and we cannot trace it to any actions that we take ourselves. The Docker Compose configuration you shared with us does not restart the service automatically. We can set it up ourselves to do this but it won't solve the issue. Do you have any news or information regarding the issue?
Hi @Sticcia - can you share the version of docker (docker --version
) and docker compose (docker compose version
) that your server is running on?
Can you also run docker inspect --format='{{json .State.Health}}' <ID of the running server container>
for the server container? You may have to run docker ps -a
to list the IDs of the running containers and select the one for the server container.
Please also run docker events --since 10m --until 0m
(replacing 10m
and 0m
with the appropriate duration in the past when the incident occurred, the format is that of Go duration strings).
Are health_status
events reported? These events should have been created when the healthcheck identifies changes in the status of the pod.
Are there any other events reported by Docker around that time that might help debug this issue?
@iainsproat
Docker versions:
Docker version 20.10.5+dfsg1, build 55c4c8
docker-compose version 1.25.0, build unknown
docker-py version: 4.1.0
CPython version: 3.9.2
OpenSSL version: OpenSSL 1.1.1n 15 Mar 2022
Docker inspect:
{"Status":"unhealthy","FailingStreak":12,"Log":[{"Start":"2023-07-07T12:18:58.144622803Z","End":"2023-07-07T12:19:08.144699892Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-07T12:19:18.15169539Z","End":"2023-07-07T12:19:28.151803699Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-07T12:19:38.158376948Z","End":"2023-07-07T12:19:48.158506217Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-07T12:19:58.165514836Z","End":"2023-07-07T12:20:08.165713169Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-07T12:20:18.171790741Z","End":"2023-07-07T12:20:28.171900318Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"}]}
Docker events: docker_events.txt
Thanks for the help
I can see in the events that it is returning an exitCode=127
due to the timeout, and in Docker inspect output it says the exit code is -1
.
Docker's documentation states that:
The command’s exit status indicates the health status of the container. The possible values are:
0: success - the container is healthy and ready for use
1: unhealthy - the container is not working correctly
2: reserved - do not use this exit code
So neither -1
or 127
is valid, and perhaps we should handle the timeout condition better so that it returns a 1
.
Despite this, it is reporting the container as unhealthy
in your Docker inspect output. As we have correctly set the restart policy to always
, I would expect it to have restarted.
The latest Docker version is 24
, and the latest docker compose version is 2.19
. I do not know what the impact of updating would be, or whether it would help the stability of speckle server in this particular situation.
Hi @Sticcia - I've made a change to the healthcheck to ensure it times out correctly and responds with the exit code expected by docker.
The healthcheck command should now be:
"try { require('node:http').request({headers: {'Content-Type': 'application/json'}, port:3000, hostname:'127.0.0.1', path:'/graphql?query={serverInfo{version}}', method: 'GET', timeout: 2000 }, (res) => { body = ''; res.on('data', (chunk) => {body += chunk;}); res.on('end', () => {process.exit(res.statusCode != 200 || body.toLowerCase().includes('error'));}); }).end(); } catch { process.exit(1); }"
Let me know how you get on with this change, and please provide the docker events output, docker inspect output, and logs if you encounter any further issues.
@iainsproat - the server crashed again and didn't restart:
speckle_speckle-server_1 docker-entrypoint.sh bash ... Up (unhealthy) 127.0.0.1:3000->3000/tcp
Here are the logs you mentioned in the last comment.
Docker inspect output:
{"Status":"unhealthy","FailingStreak":14,"Log":[{"Start":"2023-07-12T10:39:18.166560137Z","End":"2023-07-12T10:39:28.166681021Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-12T10:39:38.174308127Z","End":"2023-07-12T10:39:48.174427043Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-12T10:39:58.182222813Z","End":"2023-07-12T10:40:08.182349343Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-12T10:40:18.18899444Z","End":"2023-07-12T10:40:28.189113566Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"},{"Start":"2023-07-12T10:40:38.195697535Z","End":"2023-07-12T10:40:48.195779531Z","ExitCode":-1,"Output":"Health check exceeded timeout (10s)"}]}
Docker events output:
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:26.992Z","req":{"id":"6922d38f-eb83-41d4-91f0-8c16ba949016","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"6922d38f-eb83-41d4-91f0-8c16ba949016","x-speckle-client-ip":"35.233.206.255"}},"authContext":{"auth":false},"req":{"id":"6922d38f-eb83-41d4-91f0-8c16ba949016","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"6922d38f-eb83-41d4-91f0-8c16ba949016","x-speckle-client-ip":"35.233.206.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
speckle-server_1 | {"level":"error","time":"2023-07-12T10:42:38.433Z","req":{"id":"9d30bf95-e157-4ce1-a370-8045e535f294","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"9d30bf95-e157-4ce1-a370-8045e535f294"}},"authContext":{"auth":false},"req":{"id":"9d30bf95-e157-4ce1-a370-8045e535f294","method":"GET","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"9d30bf95-e157-4ce1-a370-8045e535f294"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","query-path":"serverInfo","err":{"type":"GraphQLError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_PG.acquireConnection (/speckle-server/node_modules/knex/lib/client.js:312:26)\n at async Runner.ensureConnection (/speckle-server/node_modules/knex/lib/execution/runner.js:287:28)\n at async Runner.run (/speckle-server/node_modules/knex/lib/execution/runner.js:30:19)\n at async getServerInfo (/speckle-server/packages/server/dist/modules/core/services/generic.js:11:28)\n at async Object.serverInfo (/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js:7:20)","locations":[{"line":1,"column":2}],"path":["serverInfo"]},"msg":"graphql error"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:38.435Z","req":{"id":"9d30bf95-e157-4ce1-a370-8045e535f294","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"9d30bf95-e157-4ce1-a370-8045e535f294"}},"authContext":{"auth":false},"req":{"id":"9d30bf95-e157-4ce1-a370-8045e535f294","method":"GET","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"9d30bf95-e157-4ce1-a370-8045e535f294"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:38.444Z","req":{"id":"9d30bf95-e157-4ce1-a370-8045e535f294","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"9d30bf95-e157-4ce1-a370-8045e535f294"}},"res":{"statusCode":200,"headers":{"x-request-id":"9d30bf95-e157-4ce1-a370-8045e535f294","x-ratelimit-remaining":499,"access-control-allow-origin":"*","content-type":"application/json; charset=utf-8","content-length":"290","etag":"W/\"122-/esAIs0eu+FuVa0OMIU0zOWANWc\""}},"responseTime":60019,"msg":"request completed"}
speckle-server_1 | {"level":"error","time":"2023-07-12T10:42:44.836Z","req":{"id":"48e35154-6ad7-49a0-bf35-04655652960f","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"48e35154-6ad7-49a0-bf35-04655652960f","x-speckle-client-ip":"35.205.72.255"}},"authContext":{"auth":false},"req":{"id":"48e35154-6ad7-49a0-bf35-04655652960f","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"48e35154-6ad7-49a0-bf35-04655652960f","x-speckle-client-ip":"35.205.72.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","query-path":"serverInfo","err":{"type":"GraphQLError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_PG.acquireConnection (/speckle-server/node_modules/knex/lib/client.js:312:26)\n at async Runner.ensureConnection (/speckle-server/node_modules/knex/lib/execution/runner.js:287:28)\n at async Runner.run (/speckle-server/node_modules/knex/lib/execution/runner.js:30:19)\n at async getServerInfo (/speckle-server/packages/server/dist/modules/core/services/generic.js:11:28)\n at async Object.serverInfo (/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js:7:20)","locations":[{"line":1,"column":2}],"path":["serverInfo"]},"msg":"graphql error"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:44.839Z","req":{"id":"48e35154-6ad7-49a0-bf35-04655652960f","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"48e35154-6ad7-49a0-bf35-04655652960f","x-speckle-client-ip":"35.205.72.255"}},"authContext":{"auth":false},"req":{"id":"48e35154-6ad7-49a0-bf35-04655652960f","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"48e35154-6ad7-49a0-bf35-04655652960f","x-speckle-client-ip":"35.205.72.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
speckle-server_1 | {"level":"error","time":"2023-07-12T10:42:49.831Z","req":{"id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","x-speckle-client-ip":"35.238.118.255"}},"authContext":{"auth":false},"req":{"id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","x-speckle-client-ip":"35.238.118.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","query-path":"serverInfo","err":{"type":"GraphQLError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_PG.acquireConnection (/speckle-server/node_modules/knex/lib/client.js:312:26)\n at async Runner.ensureConnection (/speckle-server/node_modules/knex/lib/execution/runner.js:287:28)\n at async Runner.run (/speckle-server/node_modules/knex/lib/execution/runner.js:30:19)\n at async getServerInfo (/speckle-server/packages/server/dist/modules/core/services/generic.js:11:28)\n at async Object.serverInfo (/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js:7:20)","locations":[{"line":1,"column":2}],"path":["serverInfo"]},"msg":"graphql error"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:49.833Z","req":{"id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","x-speckle-client-ip":"35.238.118.255"}},"authContext":{"auth":false},"req":{"id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"ad35d1b0-bbcd-4e1a-a5c2-38c780194504","x-speckle-client-ip":"35.238.118.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
speckle-server_1 | {"level":"error","time":"2023-07-12T10:42:58.351Z","req":{"id":"ca40c1f5-9fb7-4451-a10e-42500ceae487","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"ca40c1f5-9fb7-4451-a10e-42500ceae487"}},"authContext":{"auth":false},"req":{"id":"ca40c1f5-9fb7-4451-a10e-42500ceae487","method":"GET","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"ca40c1f5-9fb7-4451-a10e-42500ceae487"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","query-path":"serverInfo","err":{"type":"GraphQLError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_PG.acquireConnection (/speckle-server/node_modules/knex/lib/client.js:312:26)\n at async Runner.ensureConnection (/speckle-server/node_modules/knex/lib/execution/runner.js:287:28)\n at async Runner.run (/speckle-server/node_modules/knex/lib/execution/runner.js:30:19)\n at async getServerInfo (/speckle-server/packages/server/dist/modules/core/services/generic.js:11:28)\n at async Object.serverInfo (/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js:7:20)","locations":[{"line":1,"column":2}],"path":["serverInfo"]},"msg":"graphql error"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:58.353Z","req":{"id":"ca40c1f5-9fb7-4451-a10e-42500ceae487","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"ca40c1f5-9fb7-4451-a10e-42500ceae487"}},"authContext":{"auth":false},"req":{"id":"ca40c1f5-9fb7-4451-a10e-42500ceae487","method":"GET","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"ca40c1f5-9fb7-4451-a10e-42500ceae487"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:58.368Z","req":{"id":"ca40c1f5-9fb7-4451-a10e-42500ceae487","method":"GET","path":"/graphql","headers":{"content-type":"application/json","host":"127.0.0.1:3000","connection":"close","x-request-id":"ca40c1f5-9fb7-4451-a10e-42500ceae487"}},"res":{"statusCode":200,"headers":{"x-request-id":"ca40c1f5-9fb7-4451-a10e-42500ceae487","x-ratelimit-remaining":499,"access-control-allow-origin":"*","content-type":"application/json; charset=utf-8","content-length":"290","etag":"W/\"122-/esAIs0eu+FuVa0OMIU0zOWANWc\""}},"responseTime":60020,"msg":"request completed"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:42:59.679Z","req":{"id":"5ffeddee-6c6e-4dae-af1a-5f92c7ff10f8","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"5ffeddee-6c6e-4dae-af1a-5f92c7ff10f8","x-speckle-client-ip":"35.238.118.255"}},"res":{"statusCode":null,"headers":{"x-request-id":"5ffeddee-6c6e-4dae-af1a-5f92c7ff10f8","x-ratelimit-remaining":499,"access-control-allow-origin":"*"}},"responseTime":9865,"msg":"request aborted"}
speckle-server_1 | {"level":"error","time":"2023-07-12T10:43:02.830Z","req":{"id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","x-speckle-client-ip":"35.186.159.255"}},"authContext":{"auth":false},"req":{"id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","x-speckle-client-ip":"35.186.159.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","query-path":"serverInfo","err":{"type":"GraphQLError","message":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stack":"KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n at Client_PG.acquireConnection (/speckle-server/node_modules/knex/lib/client.js:312:26)\n at async Runner.ensureConnection (/speckle-server/node_modules/knex/lib/execution/runner.js:287:28)\n at async Runner.run (/speckle-server/node_modules/knex/lib/execution/runner.js:30:19)\n at async getServerInfo (/speckle-server/packages/server/dist/modules/core/services/generic.js:11:28)\n at async Object.serverInfo (/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js:7:20)","locations":[{"line":1,"column":2}],"path":["serverInfo"]},"msg":"graphql error"}
speckle-server_1 | {"level":"info","time":"2023-07-12T10:43:02.833Z","req":{"id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","method":"GET","path":"/graphql","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","x-speckle-client-ip":"35.186.159.255"}},"authContext":{"auth":false},"req":{"id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","method":"GET","headers":{"host":"geo.stykka.com","connection":"upgrade","accept":"*/*","accept-encoding":"deflate, gzip","content-type":"application/json","user-agent":"GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)","x-request-id":"98f4c73b-3016-472f-b666-b1b9b1d071bd","x-speckle-client-ip":"35.186.159.255"}},"authContext":{"auth":false},"component":"graphql","graphql_operation_kind":"query","graphql_query":"{serverInfo{version}}","graphql_operation_value":"GQL query serverInfo","grqphql_operation_name":"GQL serverInfo","actionName":"query serverInfo","msg":"graphql response"}
Sentry error:
{"event_id":"13bbb381b042402e9d6b4dd03245d292","project":4505408537427968,"release":null,"dist":null,"platform":"node","message":"","datetime":"2023-07-12T10:47:58+00:00","tags":[["environment","production"],["handled","yes"],["kind","query"],["level","error"],["mechanism","generic"],["runtime","node v18.15.0"],["runtime.name","node"],["user","id:93d69b57eb759c2ef030eba5ee7186b4acdf0629e2762c3fdc36b84222f71807"],["server_name","0f7e8173e604"],["transaction","GET /graphql"],["url","http://127.0.0.1/graphql"]],"_metrics":{"bytes.ingested.event":5619,"bytes.stored.event":14234},"breadcrumbs":{"values":[{"timestamp":1688975303.689,"type":"http","category":"http","level":"info","data":{"method":"HEAD","status_code":200,"url":"http://minio:9000/speckle-server/"}},{"timestamp":1689158878.485,"type":"default","category":"query-path","level":"debug","message":"serverInfo"}]},"contexts":{"runtime":{"name":"node","version":"v18.15.0","type":"runtime"},"trace":{"trace_id":"7fe1728c5e1e4bc09d148704c179edf0","span_id":"bbe45e0d09007e38","op":"GQL query serverInfo","status":"unknown","type":"trace"}},"culprit":"GET /graphql","environment":"production","errors":[{"type":"js_no_source","url":"/speckle-server/node_modules/knex/lib/client.js"},{"type":"js_no_source","url":"/speckle-server/node_modules/knex/lib/execution/runner.js"},{"type":"js_no_source","url":"/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js"},{"type":"js_no_source","url":"/speckle-server/packages/server/dist/modules/core/services/generic.js"}],"exception":{"values":[{"type":"GraphQLError","value":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?","stacktrace":{"frames":[{"function":"Object.serverInfo","module":"server","filename":"/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js","abs_path":"/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js","lineno":7,"colno":20,"pre_context":["'use strict';","const { validateServerRole, validateScopes } = require('@/modules/shared');","const { updateServerInfo, getServerInfo, getPublicScopes, getPublicRoles } = require('../../services/generic');","module.exports = {"," Query: {"," async serverInfo() {"],"context_line":" return await getServerInfo();","post_context":[" }"," },"," ServerInfo: {"," async roles() {"," return await getPublicRoles();"," },"," async scopes() {"],"in_app":true},{"function":"getServerInfo","module":"generic","filename":"/speckle-server/packages/server/dist/modules/core/services/generic.js","abs_path":"/speckle-server/packages/server/dist/modules/core/services/generic.js","lineno":11,"colno":28,"pre_context":["const Scopes = () => knex('scopes');","const Info = () => knex('server_config');","module.exports = {"," /**"," * @returns {Promise<import('@/modules/core/helpers/types').ServerInfo>}"," */"," async getServerInfo() {"],"context_line":" const serverInfo = await Info().select('*').first();","post_context":[" serverInfo.version = process.env.SPECKLE_SERVER_VERSION || 'dev';"," serverInfo.canonicalUrl = process.env.CANONICAL_URL || 'localhost';"," return serverInfo;"," },"," async getAllScopes() {"," return await Scopes().select('*');"," },"],"in_app":true},{"function":"Runner.run","module":"knex.lib.execution:runner","filename":"/speckle-server/node_modules/knex/lib/execution/runner.js","abs_path":"/speckle-server/node_modules/knex/lib/execution/runner.js","lineno":30,"colno":19,"pre_context":["\r"," // \"Run\" the target, calling \"toSQL\" on the builder, returning\r"," // an object or array of queries to run, each of which are run on\r"," // a single connection.\r"," async run() {\r"," const runner = this;\r"," try {\r"],"context_line":" const res = await this.ensureConnection(ensureConnectionCallback);\r","post_context":["\r"," // Fire a single \"end\" event on the builder when\r"," // all queries have successfully completed.\r"," runner.builder.emit('end');\r"," return res;\r","\r"," // If there are any \"error\" listeners, we fire an error event\r"],"in_app":false},{"function":"Runner.ensureConnection","module":"knex.lib.execution:runner","filename":"/speckle-server/node_modules/knex/lib/execution/runner.js","abs_path":"/speckle-server/node_modules/knex/lib/execution/runner.js","lineno":287,"colno":28,"pre_context":["\r"," if (this.connection) {\r"," return cb(this, cbParams);\r"," }\r","\r"," let acquiredConnection;\r"," try {\r"],"context_line":" acquiredConnection = await this.client.acquireConnection();\r","post_context":[" } catch (error) {\r"," if (!(error instanceof KnexTimeoutError)) {\r"," return Promise.reject(error);\r"," }\r"," if (this.builder) {\r"," error.sql = this.builder.sql;\r"," error.bindings = this.builder.bindings;\r"],"in_app":false},{"function":"Client_PG.acquireConnection","module":"knex.lib:client","filename":"/speckle-server/node_modules/knex/lib/client.js","abs_path":"/speckle-server/node_modules/knex/lib/client.js","lineno":312,"colno":26,"pre_context":[" try {\r"," const connection = await this.pool.acquire().promise;\r"," debug('acquired connection from pool: %s', connection.__knexUid);\r"," return connection;\r"," } catch (error) {\r"," let convertedError = error;\r"," if (error instanceof TimeoutError) {\r"],"context_line":" convertedError = new KnexTimeoutError(\r","post_context":[" 'Knex: Timeout acquiring a connection. The pool is probably full. ' +\r"," 'Are you missing a .transacting(trx) call?'\r"," );\r"," }\r"," throw convertedError;\r"," }\r"," }\r"],"in_app":false}]},"raw_stacktrace":{"frames":[{"function":"Object.serverInfo","module":"server","filename":"/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js","abs_path":"/speckle-server/packages/server/dist/modules/core/graph/resolvers/server.js","lineno":7,"colno":20,"pre_context":["'use strict';","const { validateServerRole, validateScopes } = require('@/modules/shared');","const { updateServerInfo, getServerInfo, getPublicScopes, getPublicRoles } = require('../../services/generic');","module.exports = {"," Query: {"," async serverInfo() {"],"context_line":" return await getServerInfo();","post_context":[" }"," },"," ServerInfo: {"," async roles() {"," return await getPublicRoles();"," },"," async scopes() {"],"in_app":true},{"function":"getServerInfo","module":"generic","filename":"/speckle-server/packages/server/dist/modules/core/services/generic.js","abs_path":"/speckle-server/packages/server/dist/modules/core/services/generic.js","lineno":11,"colno":28,"pre_context":["const Scopes = () => knex('scopes');","const Info = () => knex('server_config');","module.exports = {"," /**"," * @returns {Promise<import('@/modules/core/helpers/types').ServerInfo>}"," */"," async getServerInfo() {"],"context_line":" const serverInfo = await Info().select('*').first();","post_context":[" serverInfo.version = process.env.SPECKLE_SERVER_VERSION || 'dev';"," serverInfo.canonicalUrl = process.env.CANONICAL_URL || 'localhost';"," return serverInfo;"," },"," async getAllScopes() {"," return await Scopes().select('*');"," },"],"in_app":true},{"function":"Runner.run","module":"knex.lib.execution:runner","filename":"/speckle-server/node_modules/knex/lib/execution/runner.js","abs_path":"/speckle-server/node_modules/knex/lib/execution/runner.js","lineno":30,"colno":19,"pre_context":["\r"," // \"Run\" the target, calling \"toSQL\" on the builder, returning\r"," // an object or array of queries to run, each of which are run on\r"," // a single connection.\r"," async run() {\r"," const runner = this;\r"," try {\r"],"context_line":" const res = await this.ensureConnection(ensureConnectionCallback);\r","post_context":["\r"," // Fire a single \"end\" event on the builder when\r"," // all queries have successfully completed.\r"," runner.builder.emit('end');\r"," return res;\r","\r"," // If there are any \"error\" listeners, we fire an error event\r"],"in_app":false},{"function":"Runner.ensureConnection","module":"knex.lib.execution:runner","filename":"/speckle-server/node_modules/knex/lib/execution/runner.js","abs_path":"/speckle-server/node_modules/knex/lib/execution/runner.js","lineno":287,"colno":28,"pre_context":["\r"," if (this.connection) {\r"," return cb(this, cbParams);\r"," }\r","\r"," let acquiredConnection;\r"," try {\r"],"context_line":" acquiredConnection = await this.client.acquireConnection();\r","post_context":[" } catch (error) {\r"," if (!(error instanceof KnexTimeoutError)) {\r"," return Promise.reject(error);\r"," }\r"," if (this.builder) {\r"," error.sql = this.builder.sql;\r"," error.bindings = this.builder.bindings;\r"],"in_app":false},{"function":"Client_PG.acquireConnection","module":"knex.lib:client","filename":"/speckle-server/node_modules/knex/lib/client.js","abs_path":"/speckle-server/node_modules/knex/lib/client.js","lineno":312,"colno":26,"pre_context":[" try {\r"," const connection = await this.pool.acquire().promise;\r"," debug('acquired connection from pool: %s', connection.__knexUid);\r"," return connection;\r"," } catch (error) {\r"," let convertedError = error;\r"," if (error instanceof TimeoutError) {\r"],"context_line":" convertedError = new KnexTimeoutError(\r","post_context":[" 'Knex: Timeout acquiring a connection. The pool is probably full. ' +\r"," 'Are you missing a .transacting(trx) call?'\r"," );\r"," }\r"," throw convertedError;\r"," }\r"," }\r"],"in_app":false}]},"mechanism":{"type":"generic","handled":true}}]},"extra":{"query":"{serverInfo{version}}","variables":"[undefined]"},"fingerprint":["{{ default }}"],"grouping_config":{"enhancements":"eJybzDRxc15qeXFJZU6qlZGBkbGugaGuoeEEAHJMCAM","id":"newstyle:2023-01-11"},"hashes":["417af1dfb2d7a62282628d53be75280d","398a1bbc2e7bcd6edbe5b994316b8afd"],"ingest_path":[{"version":"23.6.2","public_key":"XE7QiyuNlja9PZ7I9qJlwQotzecWrUIN91BAO7Q5R38"}],"key_id":"3236202","level":"error","location":"/speckle-server/packages/server/dist/modules/core/services/generic.js","logger":"","metadata":{"display_title_with_tree_label":false,"filename":"/speckle-server/packages/server/dist/modules/core/services/generic.js","function":"getServerInfo","type":"GraphQLError","value":"Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?"},"nodestore_insert":1689158879.262104,"processed_by_symbolicator":true,"received":1689158878.589285,"request":{"url":"http://127.0.0.1/graphql","method":"GET","query_string":[["query","{serverInfo{version}}"]],"headers":[["Connection","close"],["Content-Type","application/json"],["Host","127.0.0.1:3000"],["X-Request-Id","2c5a421e-dfce-45ec-9e16-7d5823587d96"]],"inferred_content_type":"application/json"},"sdk":{"name":"sentry.javascript.node","version":"6.19.7","integrations":["InboundFilters","FunctionToString","ContextLines","Console","OnUncaughtException","OnUnhandledRejection","LinkedErrors","Postgres","Http","Express"],"packages":[{"name":"npm:@sentry/node","version":"6.19.7"}]},"timestamp":1689158878.485,"title":"GraphQLError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(t...","transaction":"GET /graphql","type":"error","user":{"id":"93d69b57eb759c2ef030eba5ee7186b4acdf0629e2762c3fdc36b84222f71807"},"version":"7"}
The only difference we noticed is that the Sentry error was triggered many times (untill we stopped and started it manually), we believe it was every health check (alternating between the container check on 127.0.0.1 and our monitoring check on the public URL).
Thanks for providing this.
After reading a little more about Docker Compose's behaviour, it seems that it will not restart unhealthy containers 😞 . (For security reasons I would avoid the 'autoheal' solution mentioned in that link, as it requires mounting the docker socket to that container, effectively giving it access to your host machine.)
From your error message we're again seeing the same Knex: Timeout acquiring a connection. The pool is probably full.
. This might be for a number of reasons (as I mentioned above), for example:
If you are able to connect Prometheus or Grafana to the metrics endpoint (/metrics
) of the speckle server. There is a metric, speckle_server_knex_free
, which provides data on the number of available connections. You can plot this over time to see if connections are become saturated. If connections remain free then it may point to an issue with the network or the postgres server itself.
As a possible solution to saturated connections, you could attempt to increase the number of available connections by adding the POSTGRES_MAX_CONNECTIONS_SERVER
environment variable to the server. The default for this environment variable is 4
, so you could try increasing it to 8
. Please note that there may be limitations on the number of connections that your postgres server can handle, and the larger number of connections may require more cpu & memory resources for both the database and speckle server.
You could also provide redundancy by providing a one or more replicas of the speckle server container. If one container was to fail, or its database connections become temporarily saturated, then the other could continue to operate and prevent downtime. Please refer to Docker's documentation on replicas.
Hi again @iainsproat I have set-up a Grafana server on a separate VM instance but I am having trouble configuring the Prometheus data source.
You mentioned connecting to the /metrics
endpoint, do I need to enable that explicitly in the configuration file?
Does the Grafana server need to run in a container on the same host as the Speckle services or can it be on a different host?
I couldn't find any documentation regarding this. Maybe you have an example of a data source configuration?
Can you connect to your speckle server and query your-domain/metrics
or localhost:3000/metrics
?
I'm not sure how you've configured ingress, if you're using Speckle's DigitalOcean 1Click deployment the ingress will prevent external access to /metrics
. If you have deployed Grafana to a separate machine without direct network access to the speckle server, you would want to have an ingress which adds some sort of authentication to prevent public access to /metrics
.
It is probably easiest to run it on the same VM, and deploy it via docker compose on the same docker network as speckle server.
Hi @iainsproat, we have set-up Grafana and the Prometheus logging. Server crashed again this weekend: I created a public snapshot (I can send you the JSON file too if needed).
Our health-check was triggering restarts too fast, we have fixed that now. You can see in the graph that we stopped the health-check at 17:07 and let the server run a full restart which worked fine.
Also, the server usage was basically zero at the time of the crash (being a Sunday with most of us on vacation), so no relation to user actions was seen.
Thanks @Sticcia
For my understanding, can you confirm that the crash first occurred at approximately 14:30?
And what we're seeing between 14:30 and 17:07 are aggressive restarts until you disabled healthchecks?
To prevent the aggressive restarts, I think we may need to increase the start_period
time on the healthcheck. It takes the server a moment to start up, during which it is expected to be unresponsive. If the healthcheck runs during that time it will experience a failure and cause the restart behaviour.
@iainsproat - your understanding is correct but when I mentioned a health-check in the previous comment I meant a check made by us, not the Docker Compose health check, I wasn't very clear, sorry. We had already made the same fix as you just did. Does the output from Grafana help find the core issue? Can we find where all these connections came from? Can we increase the number of available Knex connections? If yes, how?
Grafana is helpful, as I think we see a dramatic increase in connections at around 14:30. But it's hard to tell if this is related or due to the restart, and its a consequence rather than the cause.
Do you still have the data available in Grafana to plot the metric speckle_server_knex_free
? I'd like to see its values between 14:20 and 14:30, and a minute or two at either side of that time period.
Do you have logs for the events that occur around 14:30?
I'd like to establish more exact timeline; do we see the increase in connections before or after the restart?
@iainsproat
I was mistaken, the server crashed twice this weekend. Once at 10:45 where our uptime check got stuck in a restart loop (this is clearly visible in the graphs too, there are gaps during the restarts).
And once at 14:20, where the health check was disabled and the restart was done manually at 17:00.
Sorry for the confusion. Here is the data you asked for the crash at 14:20 (with no action taken in this time slot).
Speckle Server container logs: logs.txt
Grafana data export: grafana_data_export.csv
The increase of connections is before the restart.
Hi again @iainsproat
We managed to replicate the crash fairly consistently (although not 100% of the time) by lowering the VM type (CPU and memory) and overloading the server with many open connections.
We don't know if this was the same issue that happened sporadically before but the crash errors are the same.
We will write some tests to load the server in a more controlled way and see if that affects the performance/crash frequency.
@Sticcia - that's a great finding, it sounds as if it might be a resource management issue.
You may wish to run the server in inspect
mode to gain more diagnostic capabilities. Add the following command when running the server: "--inspect=PORT
(replacing PORT
with a port number you wish to expose the inspect interface on). More details about inspect
, can be found on node's website. This may provide more insight into the resource usage.
@iainsproat - thanks for the quick response, if I understand correctly I should edit the docker-compose configuration with something like
speckle-server:
image: speckle/speckle-server:2.13.3
ports:
- '127.0.0.1:3000:3000'
- '127.0.0.1:3000:9229'
command: ['bash', '-c', '/wait && node --inspect=9229 bin/www']
and then I can run something like Chrome DevTools on geo.stykka.com:9229 to see the debug output?
@Sticcia - yes, though I think the command can just be --inspect=9229 bin/www
as the entrypoint is already node
.
Most importantly, I would not expose the port to the internet due to the security implications of this. Instead I would suggest creating an ssh tunnel to the host machine, or something similar which would allow you to access the port securely and without exposing it publicly.
Any update on this? We are facing the very same issue as well. We have increased VM sizes to
EC2 instance type: c7i.8xlarge
RDS: db.t4g.xlarge
and still face random crashes
up @iainsproat
Hi @Sticcia - did you manage to find time to investigate this further? Does it still occur if you use the latest version of speckle's server?
Hi @iainsproat, We do not have the resources to investigate this further. We scaled up the host configuration which, together with a auto-heal service, reduced the down time to only a few seconds a month. We are still running a 2.18.x version, I will monitor it more after we upgrade and let you know if the problem persists.
Prerequisites
What package are you referring to?
Server (backend)
Describe the bug
Our Speckle Server is unexpectedly going in an unresponsive state. Restarting the speckle-server Docker container will solve the issue but we can't understand what is causing it. We have investigated the logs and only found some error messages from /graphql API requests (part of the log file attached below). Following the unexpected crash all requests to the Speckle server return:
Error: failed with status code 500
.Expected behavior
Server running smoothly and all requests handled correctly
System Info
Google Cloud VM Instance:
Failure Logs
Last 500 lines of the speckle-server container logs
Affected Projects
Speckle Frontend UI and Javascript viewer are not loading, .NET requests are failing.