Open rawaa123 opened 2 weeks ago
Container Image updates cause downtime, which is generally bad™.
Stateful services are being updated during working hours.
Unrelated problems:
coursemapper-kg-concept-map
: Asynchronous worker, temporary outages should not cause downtime; stateless, but no rolling updates due to memory requirementscoursemapper-kg-recommendation
: Asynchronous worker, should not cause downtime; stateless, but no rolling updates due to memory requirementscoursemapper-kg-wp-pg
: Database, stateful, HA would require running a clustermongodb
: Database, stateful, should actually be webserver-mongodb 👀webapp
: User-facing web service, stateless, rolling updates, should not cause any downtime. Increased replicas in preview environment edge from 1 to 2, just in case. Production runs 3 replicas.webserver-neo4j
: Database, stateful, HA would require running a clusterwebserver-redis
: Database, stateless, in-memory – this might be a problem. Updating essentially drops all the contents in memory, which might be fine for e.g. caches, but not for job queues.webserver-web
: User-facing web service, stateful – this is a problem. Can't perform upgrades without interruption due to required volume attachments. Should be refactored to use e.g. S3-compatible storage for file uploads.@ralf-berger Thanks for your infromation. 1- I would like to mention that whenever we try to push changes to Edge, they do not release automatically, and it takes considerable time for the changes to be reflected. Previously, when we pushed changes, it took only a few minutes for them to be released on Edge.
2- Also, in Argo, I can see that it is OutOfSync. What could be causing this?"
3- Is it possible to trigger synchronization by clicking on the 'SYNCHRONIZ' button as shown in the attachment? Or will this result in an error?
@ralf-berger we still have some issues with the connection to wikipedia_service in coursemapper-kg/concept-map as given in the attachment.
It is very urgent as we are planing to test the system tommorrow and finalize. Your help is highly appreciated
@ralf-berger So sorry for the insistence, but we are running out of time and need to test with the students at 12 pm today, if possible. As we are planing to deploy it on live next week. We are still experiencing issues with the connection to the wikipedia_service, as shown in the attachment. The connection to server at "10.43.171.86", port 5432 failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Could you please stop the automatic security updates from Dependabot so we can handle them manually as needed? These updates occasionally cause the server to be down for a few seconds, which disrupts operations.
Additionally, during our testing today, we encountered two errors:
Argo Server was not accessible for a few seconds (see attached image).
Afterward, we received the following error message: "Consuming input failed: terminating connection due to administrator command. The server closed the connection unexpectedly. This likely indicates that the server terminated abnormally before or during the request processing." (see attached image).
Could you please advise or suggest solutions regarding these issues? Your help is highly appreciated.