odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
786 stars 258 forks source link

Rethink ConnectorConfigurationFactory #1607

Closed mandy-chessell closed 2 years ago

mandy-chessell commented 4 years ago

The ConnectorConfigurationFactory is used by the admin services to create default Connection objects when configurating OMAG Servers.

The first problem with it is that it is located in the adapters module which is confusing. The adapters module should contain implementations of connectors for specific technologies and so the connector configuration factory does not belong there. It should be moved under the admin services module.

The second problem is that this class pulls the connector implementations used in our default set up into the OMAG server platform. These remain even when that technology is not being used. Most are file-based connectors which are pure java and do not pull in client libraries. However we do pull in the Kafka event bus provider. Is this a problem for an organization that does not use Kafka? Should we change this to a hardcoded definition using strings?

There is a similar problem for the goverance daemons whose purpose is to connect to lots of different technologies. The hard-coding approach quickly becomes untenable. We cannot have all of these connectors loaded into the OMAG Server Platform.

Should we have an admin client that is responsible for creating the Connection objects using the connector implementations and then sending the connection objects only to the admin services to store?

mandy-chessell commented 4 years ago

Another approach, is to maintain the connection definitions as an open metadata archive so they are built as part of the build based in the actual implementation without creating dependencies in the server.

This would make it easy to load them into an open metadata repository. It would require some thought to enable the platform/admin services to be able to load them as archive loading is typically done by the OMRS at server startup.

mandy-chessell commented 4 years ago

A final approach is to manage them in the build, as with the archive approach, but serialize the connection objects directly to JSON files. The admin services are already able to work with serialized connection objects because they are used in the configuration documents.

planetf1 commented 4 years ago

See also comment on #1606 about provided scope - but this causes it's own set of issues... In a pluggable environment, the organization should be able to chose what to deploy, and if they don't want to use our kafka connector, there should be no remnants of kafka on their system in an idea world.. I do think loading the default set always is not appropriate, we should load whatever is configured for that deployment

mandy-chessell commented 4 years ago

I would ideally like to get to a clean architecture in the code if possible rather than rely on build options - just to keep it simple to understand. The connector broker dynamically loads classes as long as they are in the class path. If we had a common place where the jars for connectors could be assembled during the build, and we removed the compile dependencies then as long as the directory where the connectors are located is in the class path for the platform then the platform would start lean and only load the connectors requested by the servers running on the platform.

planetf1 commented 4 years ago

I completely agree - my suggestion doesn't replace or negate that, it's about the connectors themselves, which are necessarily dependent on more external libraries. It is an optional consideration for the way we build them.

mandy-chessell commented 4 years ago

ahhh :)

cmgrote commented 4 years ago

I think the challenge we'll face with putting together the connection objects at build time rather than run-time are typical run-time differences that shouldn't impact the build -- in particular when it comes to testing across the SDLC. That is, we ideally want to have a consistent artefact that we can test across various lifecycle environments, without needing to build a different artefact for each one to cater for differences like credentials or hostnames (endpoints).

So if we went the serialization approach, I think we'd need to have some way of embedding variables / parameters that could be injected at run-time (?)

mandy-chessell commented 4 years ago

@cmgrote if I am understanding the comment correctly, it is a little off topic.

All server Connection parameters that need to be injected at runtime should be in the configuration document for the server. These properties can be changes bu updating the config document to accomodate any changes during SDLC.

The Connection objects for the Platform are passed in over REST after the platform has started.

The ConnectorConfigurationFactory is an admin component that builds the Connection objects for default components. It was being misused by various governance servers which were not storing Connection objects in their part of the ConfigurationDocument. I think that is all cleared up.

Its current implementation causes the classes for the default connectors to be loaded by the class loader because the classes are referenced explicitly in order to extract the class name. If the class names were hardcoded strings then the class loader is not involved and the classes for the default components are not loaded.

This approach creates 2 challenges:

It is the second part that needs work.

cmgrote commented 4 years ago

Makes sense -- I think I've confused myself with the earlier comments referring to "connection definitions" and "connection objects"... Based on the statement above, I think what we're saying is:

When we were talking about "connections" I was thinking about the connection parameters we typically send in via a request that get pulled through a bean into the endpoint, credentials, etc that are ultimately used to configure the connector...

For the first part it might even make sense to leverage a "resources" file that puts these string values into a .properties or something along those lines, so that someone can easily override even the default ConnectorProviders they want to use across their environments? (I'm thinking the likes of swapping between the simple plain-text file-based server config store and the encrypted one, for example.) That would also de-couple any changes from any need to re-compile the code itself (?)

mandy-chessell commented 4 years ago

When I refer to connections I am refering to the object that is passed into the connector broker to configure the connector. This is only the information that is passed to a connector to configure it.

We do not want any other back door configuration

mandy-chessell commented 4 years ago

There is no need to recompile the code when we switch connectors - we just change the configuration document and a different connector is used.

The only time the names of the default connector provider classes change will be if we refactor the code - in which case we need to recomplie anyway

cmgrote commented 4 years ago

When I refer to connections I am refering to the object that is passed into the connector broker to configure the connector.

I'm back to my original question, then... Isn't this object that's passed to the connector broker where we would typically put information like endpoint, credentials, etc (?) If we serialize these, and our connector relies on endpoint and credential information, wouldn't we be locking them in as part of the serialization (?)

(As explained in https://egeria.odpi.org/open-metadata-implementation/frameworks/open-connector-framework/docs/concepts/connection.html / in the code under org.odpi.openmetadata.frameworks.connectors.properties.beans.Connection)

There is no need to recompile the code when we switch connectors - we just change the configuration document and a different connector is used.

Right, once we have the platform up and can change the configuration document -- but I thought part of the challenge was for scenarios where someone might want something other than the default config to bring up the platform in the first place (a sort of bootstrapping question) (?) But maybe we're suggesting that the platform always starts up the first time with the default config, and only if then overridden by config document changes do we replace with something else. (Which is probably fair: since without any config document setup, we probably don't really load much of anything anyway?)

Thinking we probably need a whiteboard and to walkthrough some scenarios step-by-step -- maybe something for our next session 🙂

mandy-chessell commented 4 years ago

You are right that the connection information that the administrator adds to the connection object is serialized into the configuration document for the server and is used each time the server comes up - this is desired behaviour.

I think you are getting confused between platform connectors and server connectors.

The platform comes up with its default connectors and these are overridden by REST calls. A configuration document does not control these connectors.

The connectors I am concerned about are the server ones covering, for example, the Kafka event bus. These are controlled by the configuration documents - one configuration document per server.

mandy-chessell commented 4 years ago

In 1.3/1.4 there is now a lib directory in the server assembly that is holding the connector implementations. This is to get them onto the class path so Egeria can load the connector implementation.

planetf1 commented 4 years ago

2375 will aim to move to using (and proving) the assemblies in our containers and fix #2215 , and #2378 will document best practice for developers using an IDE

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

planetf1 commented 4 years ago

3101 identifies ensuring all of our connectors are added to the asasembly. If that is done, we can then change the approach and only load connectors as needed

cmgrote commented 4 years ago

Given the security focus for release v2.0, issue #3035, and related rework already mentioned in #3101 also targeted for v2.0, I think we should target this refactoring for v2.0 as well?

Without this refactoring, it seems we'd be left with hard-coded settings that I don't think we have a way of overriding without someone rewriting and recompiling the ConnectorConfigurationFactory (?) For example, getServerConfigConnection method is hard-coded to the FileBasedServerConfigStoreProvider connector type.

planetf1 commented 3 years ago

In 2.7 I think we can start to move away from fat connectors & chassis & slim down our jars. That aggregation will instead be done in the assembly. We should then be able to confirm if this is now addressed and/or complete the remaining work?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.