Open johnml1135 opened 1 week ago
So - can we make the Engine Server only do ClearML health checking and the Job Server only queue and report on jobs?
Although unlikely, it is possible that some services are having difficulty connecting to ClearML and others are not. The point of the health check for a service is to determine if that service is healthy not to determine if the whole system is healthy.
The engine and Job servers both appear to be doing the same task (for instance, monitoring ClearML health and having a full job server). Likely this is unneeded - there should be a clearer separation between what the engine server does and what the job server does.