Closed cmgrote closed 2 years ago
Thank you Chris @cmgrote for opening this issue. This will be useful information to know the status of cohort/s for a local server. The way it is proposed, it will be connector agnostic.
In addition to providing health check, we should also revisit the polling logic. Currently, Egeria tries to connect to Topic server in some sort of loop and fills up logs. This has been an issue whenever Kafka or server is not reachable. Logs are rolling over. We should poll in intervals not in a loop and also suppress logs if we can...
@guptaneeru probably multiple points here a) Whether an audit event is generated within that connection attempt - I'd err on probably but haven't looked in enough detail at how tight that is. b) The behaviour of the default audit log providers - for example it could handle repeated events better (last event occurred 10 times), or a wrapping logger could be provided. c) More than b) The fact that the audit log framework is pluggable - so a new logger could be written to better suit your needs (including log cycling etc)?
Thank you @Nigel Nigle. How can I add our own audit logger?
@guptaneeru :
implementations of an audit logger at https://github.com/odpi/egeria/tree/master/open-metadata-implementation/adapters/open-connectors/repository-services-connectors/audit-log-connectors
In terms of the polling of kafka specifically. I've taken a look at the code. Issue odpi/egeria#5681 touches on this but is very specifically about the state of the server during the initialisation period. odpi/egeria-docs#447 is to clarify and document the startup behaviour. This issue is exploring specific approaches to understand more broadly the health of the system's connectors, which may go some way to address that issue. I think what's important is that the status can be understood at various levels (platform, server, connector(s) depending on the needs of the caller, and matching appropriate APIs that act upon the config). The need for this becomes more acute when we have replicas of a server, since we want to direct requests to the set of working replicas, not the bad ones...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
The objective of this issue is to discuss options and a proposed approach for exposing basic status / health check information on the connectivity of a given server to its cohort(s).
Understanding of current interactions:
OMRSTopicConnector
instances that appear to define the interactions with a cohortOMRSCohortManager
classOMRSCohortManager
class (one per cohort?) are then available within theOMRSMetadataHighwayManager
class, which manages the connectivity to each cohort that a local server is a member ofOMRSMetadataHighwayManager
class is then in turn exposed via APIs whose logic is implemented through theOMRSMetadataHighwayRESTServices
class and ultimately bound to APIs (using Spring) inMetadataHighwayServicesResource
OMRSTopicConnector
appears to be through a list ofOpenMetadataTopicConnector
instancesOMRSTopicConnector
class (which itself implements a number of interfaces, notablyOMRSTopic
andOpenMetadataTopicListener
as well as a base classConnectorBase
) or theOpenMetadataTopicConnector
class (which also extendsConnectorBase
and implements a different interface:OpenMetadataTopic
)Suggestions for providing status / health check information:
KafkaOpenMetadataTopicConnector
, which extendsOpenMetadataTopicListener
), we can see that this ultimately extends theConnectorBase
abstract classstart
ed anddisconnect
ed to be able to communicate basic status / health-check information, perhaps it would make sense to extend the underlyingConnector
abstract class with such a status retrieval method (?)isActive()
method defined at theConnectorBase
level from which everything extends, but this is very binary and currently based purely on whether the connector has beenstart
ed ordisconnect
edSo as a proposed approach:
Connector
abstract class to retrieve a connector status object (to be defined, but likely including at least an enumerated status (specific values to be defined), some more "free-form" informational field (string)?)ConnectorBase
that simply re-uses theisActive()
method also defined there to translate the binary (boolean) ofisActive()
into a basic status objectConnector
to override this new method with a more granular detection of various statuses (non-binary)OpenMetadataTopicConnector
, etc classes upwards to the APIsThis would therefore not change any of the existing interfaces of a Connector while providing a default implementation of the logic that is based on already-existing and self-contained methods in the top-level abstract implementation (
ConnectorBase
), so I believe would also be backwards-compatible (?)