Closed jrgriffiniii closed 5 days ago
This is no longer a high priority, as the DataSpace production deployment was restored. However, I am still uncertain as to what triggered the restoration of PostgreSQL connections.
This error arose once again following the server maintenance procedures undertaken on a weekly basis. For this incident, there are reportedly difficulties related to the Google Cloud Platform service provider, and this has been blocking me from accessing the production PostgreSQL server which is required by the production deployment of DataSpace.
I could not connect to the database server from the production DataSpace server, however, Alicia has identified the server on the GCP with: https://console.cloud.google.com/logs/query;query=resource.type%3D%22cloudsql_database%22%0Aresource.labels.database_id%3D%22pul-gcdc:pul-prod-db%22%0Alog_name%3D%22projects%2Fpul-gcdc%2Flogs%2Fcloudsql.googleapis.com%252Fpostgres.log%22;cursorTimestamp=2024-06-26T14:43:03.433424Z;duration=PT1H?q=search&referrer=search&project=pul-gcdc
The following is being logged from the DataSpace prod host:
{
"textPayload": "2024-06-26 14:59:28.453 UTC [812056]: [1-1] db=dataspace_prod_db,user=dataspace_prod_db_user FATAL: password authentication failed for user \"dataspace_prod_db_user\"",
"insertId": "s=00ecd4b5b5f14233aeaab022178f6666;i=23a817;b=2eedae0a03d44e5c81bede0663c852aa;m=e2174a7a63;t=61bcc42cfac6f;x=b70bf5e216e07c98-0@a3",
"resource": {
"type": "cloudsql_database",
"labels": {
"database_id": "pul-gcdc:pul-prod-db",
"project_id": "pul-gcdc",
"region": "us-east4"
}
},
"timestamp": "2024-06-26T14:59:28.453584Z",
"severity": "ALERT",
"labels": {
"INSTANCE_UID": "11-c6b91f46-57e1-4f05-bece-999102cc27a8",
"LOG_BUCKET_NUM": "56",
"SOURCE_ID": "30373532306631343535316233303537e3b0c442"
},
"logName": "projects/pul-gcdc/logs/cloudsql.googleapis.com%2Fpostgres.log",
"receiveTimestamp": "2024-06-26T14:59:46.102111106Z"
}
A consistent password update needed to be applied to connect to the GCP database server.
Clearing the idle connections over psql
was necessary to restore service for DataSpace.
Following tomorrow morning it shall be determined whether or not this should be reopened.
Following a meeting this morning led by @kayiwa and @VickieKarasic, I am going to be added to the list of users who receive Monit alerts in order to try and troubleshoot the Postgres sessions which remain idle in the connection pool.
Adding this to the upcoming sprint in the event that it comes up again during the migration and needs to be worked.
This once again occurred during the maintenance window for this morning. I am addressing this right now.
Service was restored by restarting DSpace with and invocation of dsrestart
, followed by the termination of several idle PostgreSQL connections:
dataspace_prod_db=> select pid,state,datname from pg_stat_activity where datname = 'dataspace_prod_db';
pid | state | datname
---------+--------+-------------------
1356835 | idle | dataspace_prod_db
1357105 | active | dataspace_prod_db
1356836 | idle | dataspace_prod_db
1356837 | idle | dataspace_prod_db
1357145 | idle | dataspace_prod_db
1357188 | idle | dataspace_prod_db
1357112 | active | dataspace_prod_db
(7 rows)
dataspace_prod_db=> select pg_terminate_backend(1357188);
pg_terminate_backend
----------------------
t
(1 row)
The following is logged when restarting or rebuilding the DataSpace production deployment: