database: if any cocalc server attempts to connect to the database and gets back a message that the password is wrong, it should just terminate (since there is no hope for things to work)

williamstein commented 2 years ago

Rather than retrying, in the case when the response from PostgreSQL is that the password is wrong, then the only sane course of action is shutting down. This isn't a temporary error, e.g., due to network problems. The design where it doesn't shut down in such a situation leads to cocalc servers not probably being detected as bad by Kubernetes.

2022-06-14 20:02:39.565 UTC [7587] FATAL:  password authentication failed for user "smc"
2022-06-14 20:02:39.565 UTC [7587] DETAIL:  Password does not match for user "smc".
        Connection matched pg_hba.conf line 85: "host all smc all md5"

A deeper mystery is how this situation could ever come up... and I don't understand that. It might be that PostgreSQL just has a security mechanism where it denies even correct passwords in some cases. Alternatively, maybe in the context of kubernetes, our servers sometimes fall back to a bad password (e.g., blank) in some edge case, and then of course the right solution would be to kill and restart such pods. The health checks will only accomplish this if the server terminates.

haraldschilly commented 1 year ago

I saw something similar today. Migrating a database (essentially restarting with a delay) and websocket hub's were not able to properly reconnect. The log says this, and well, in particular, the oddest thing is to state the DB connection is OK and then throwing such an exception:

2023-03-23T08:59:35.091Z: cocalc:debug:hub:healthcheck process_health_check: no DB connection problems – OK

2023-03-23T08:59:35.648Z: cocalc:error:hub BUG ****************************************************************************

2023-03-23T08:59:35.648Z: cocalc:error:hub Uncaught exception: Error: SASL: SCRAM-SERVER-FIRST-MESSAGE: client password must be a string

Error: SASL: SCRAM-SERVER-FIRST-MESSAGE: client password must be a string
    at Object.continueSession (/cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/sasl.js:24:11)
    at Client._handleAuthSASLContinue (/cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/client.js:257:10)
    at Connection.emit (node:events:513:28)
    at /cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/connection.js:114:12
    at Parser.parse (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/parser.ts:104:9)
    at Socket.<anonymous> (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/index.ts:7:48)
    at Socket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
2023-03-23T08:59:35.655Z: cocalc:error:hub Error: SASL: SCRAM-SERVER-FIRST-MESSAGE: client password must be a string
    at Object.continueSession (/cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/sasl.js:24:11)
    at Client._handleAuthSASLContinue (/cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/client.js:257:10)
    at Connection.emit (node:events:513:28)
    at /cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/connection.js:114:12
    at Parser.parse (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/parser.ts:104:9)
    at Socket.<anonymous> (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/index.ts:7:48)
    at Socket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)

2023-03-23T08:59:35.655Z: cocalc:error:hub BUG ****************************************************************************

similarly, with a slightly different exception

2023-03-23T08:59:25.092Z: cocalc:debug:hub:healthcheck process_health_check: no DB connection problems – OK

2023-03-23T08:59:25.747Z: cocalc:debug:primus Client(H9k5Yb4CbnDVwMk1gThhZ).push_to_client: [0 mesg_time_ms]  [0 mesg_avg_ms] -- mesg.id=817c8bf6-1e74-4a3b-8cb2-35c5eac0b51d

ALL clients failed [
  Error: Connection terminated unexpectedly
      at Connection.<anonymous> (/cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/client.js:132:73)
      at Object.onceWrapper (node:events:627:28)
      at Connection.emit (node:events:513:28)
      at Socket.<anonymous> (/cocalc/src/packages/node_modules/.pnpm/pg@8.8.0/node_modules/pg/lib/connection.js:107:12)
      at Socket.emit (node:events:525:35)
      at endReadableNT (node:internal/streams/readable:1358:12)
      at processTicksAndRejections (node:internal/process/task_queues:83:21)
]
Failed to connect to database -- ALL clients failed to connect
2023-03-23T08:59:29.888Z: cocalc:debug:hub:healthcheck process_health_check: concurrent 0 < NaN – OK

2023-03-23T08:59:29.888Z: cocalc:debug:hub:healthcheck uptime 1m37s – draining in 5h30m41s – terminating in 6h0m41s

2023-03-23T08:59:29.888Z: cocalc:debug:hub:healthcheck process_health_check: uptime 1m37s – draining in 5h30m41s – terminating in 6h0m41s – OK

2023-03-23T08:59:29.888Z: cocalc:debug:hub:healthcheck process_health_check: no DB connection problems – OK

haraldschilly commented 1 year ago

today I saw this

ALL clients failed [                                                                                                                                      
  /cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/parser.ts:369                                                    
        name === 'notice' ? new NoticeMessage(length, messageValue) : new DatabaseError(messageValue, length, name)                                       
                                                                      ^                                                        

  error: password authentication failed for user "smc"                                                                                                    
      at Parser.parseErrorMessage (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/parser.ts:369:69)               
      at Parser.handlePacket (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/parser.ts:188:21)                    
      at Parser.parse (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/parser.ts:103:30)                           
      at Socket.<anonymous> (/cocalc/src/packages/node_modules/.pnpm/pg-protocol@1.5.0/node_modules/pg-protocol/src/index.ts:7:48)                        
      at Socket.emit (node:events:513:28)                                                                                                                 
      at addChunk (node:internal/streams/readable:315:12)                                                                                                 
      at readableAddChunk (node:internal/streams/readable:289:9)
      at Socket.Readable.push (node:internal/streams/readable:228:10)
      at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
    length: 99,
    severity: 'FATAL',
    code: '28P01',
    detail: undefined,
    hint: undefined,
    position: undefined,
    internalPosition: undefined,
    internalQuery: undefined,
    where: undefined,
    schema: undefined,
    table: undefined,
    column: undefined,
    dataType: undefined,
    constraint: undefined,
    file: 'auth.c',
    line: '331',
    routine: 'auth_failed'
  }
]
Failed to connect to database -- ALL clients failed to connect

sagemathinc / cocalc

database: if any cocalc server attempts to connect to the database and gets back a message that the password is wrong, it should just terminate (since there is no hope for things to work) #5997