ministryofjustice / nvvs-devops

Documentation for the NVVS DevOps Team
https://ministryofjustice.github.io/nvvs-devops
MIT License
4 stars 0 forks source link

Simulate MYSQL Errors from backend application #577

Closed darey-io closed 9 months ago

darey-io commented 9 months ago

Following the SPIKE on https://app.zenhub.com/workspaces/nvvs-devops-622a0b371800e400133bb924/issues/gh/ministryofjustice/nvvs-devops/485

This ticket is to develop a simple application that will use the mysqli library to simulate the errors. This will help to determine the best decision to fix the issues.

darey-io commented 9 months ago

A php application has been created and currently testing with mysqli library to simulate the "command out of sync" error

darey-io commented 9 months ago

Error still not being reproduced. Transferred code to @smjmoj perhaps he can give it a try.

The good thing is I am able to trace the error with much deeper insights and recommendations.

I will put all the information together tomorow and update this ticket.

darey-io commented 9 months ago

Below is insights into the SQL error following the discoveries from spike.

Definition of the "Command out of sync" error as documented by MySQL can be found here - https://dev.mysql.com/doc/refman/8.0/en/commands-out-of-sync.html. It occurs when in the client code (In this case KEA Server), is calling functions in the wrong order.

Breaking down the flow leading to the error....

  1. staff-device-dns-dhcp-admin interacts with staff-device-dhcp-server over http and calls the update_dhcp_config function which configures database connection and handles required changes to the DHCP infrastructure. reference --> https://github.com/ministryofjustice/staff-device-dns-dhcp-admin/blob/main/app/controllers/subnets_controller.rb#L39

  2. Part of the DHCP Server configuration has "multi-threading" enabled. The "Commands out of sync" error typically occurs in MySQL when commands are executed out of sequence. This is likely to happen when multithreading is enabled with a high thread pool size, and different threads are executing commands on the same connection, or when result sets from previous commands are not properly handled before new commands are executed. The specific setting that can cause this to happen is the "Thread Pool and Queue Size" parameters as configured here https://github.com/ministryofjustice/staff-device-dhcp-server/blob/main/dhcp-service/config_api.json#L65

  3. The "thread-pool-size" and "packet-queue-size" parameters control the number of threads and the size of the packet queue, respectively. A larger number of threads can increase the likelihood of concurrent database access problems if not managed correctly.

Recommendations:

  1. Ideally we would need to configure the client application (KEA Server) to handle the SQL querry more efficiently. However, this may not be a feasible option for us as the particular querry causing this error as reported here

https://app.zenhub.com/workspaces/nvvs-devops-622a0b371800e400133bb924/issues/gh/ministryofjustice/nvvs-devops/485 [22a912f-0e4f-40fe-877e-b1196541d8a7] Gateways::KeaControlAgent::InternalError (unable to execute for <SELECT address, hwaddr, client_id, valid_lifetime, expire, subnet_id, fqdn_fwd, fqdn_rev, hostname, state, user_context FROM lease4 WHERE subnet_id = ?>, reason: Commands out of sync; you can't run this command now (error code 2014)):

is found in the KEA source code here https://github.com/isc-projects/kea/blob/c48734c20e47355cc4b81d9316a582566eaa608b/src/lib/dhcpsrv/mysql_lease_mgr.cc#L141

  1. Adjusting Multi-Threading/thread pool Settings: As a diagnostic step, when the error occurs again, reducing the "thread-pool-size" and observing the experience if the issue persists. This can help determine if the issue is related to concurrent database access.

  2. Before the error occurs again, we can Implement monitoring and setup a dashboard to track specific behaviours, like when different threads from the KEA Server execute commands on the same connection or when result sets from previous commands are not properly handled. This can be quite complex to do. However, it is feasible with a combination of MySQL's internal logging mechanisms, integration with Prometheus and Grafana and possibly some custom scripting. If we would engage in an activity like this, it would involve steps like...