odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
786 stars 258 forks source link

Repackaging of egeria distribution & dependency streamlining #4667

Open planetf1 opened 3 years ago

planetf1 commented 3 years ago

We have made a number of changes to reduce unnecessary dependency chains & ensure Egeria can more easily be actually run in the flexible ways the architecture supports.

Very recent changes have included ensuring the connection factory loads connectors dynamically, rather than having hard dependencies (maven), as well as offering an optional profile for building the server chassis without all OMASs etc in place, rather depending on runtime scanning of classes. We also have a growing number of connectors being developed externally which need to be easy to build and deploy

Currently the main egeria distribution

Current list of potential tasks:

Open related issues:

1607 Rethink ConnectorConfigurationFactory < much of this is now likely addressed

4455 investigate use/recommendation of 'provided' scope for connectors - this is a technical aid in some areas where a compile time dependency is needed ie for a connector, but where we know the runtime is present

4280 Distribution open connector archives appears in both samples & utilities < just a cleanup

3370 Gradle Build prototype < will continue this work. Currently need to get FVTs working

2671 Cassandra Dependency < not tightly dependent, but somewhat related

cc: @mandy-chessell @lpalashevski @bogdan-sava

planetf1 commented 3 years ago

@lpalashevski Are you currently working on any aspects in this area? Or do you have any additional input or related issues to link to?

lpalashevski commented 3 years ago

Not at the momenet. I remember Bogdan was still testing some aspects with maven depencenes and the slim build for server chassis I believe its related to 4455. Lets align on the call today and plan how we continue/devide work on specific areas.

planetf1 commented 3 years ago

@bogdan-sava assigning this to you as I believe you were running with this.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

bogdan-sava commented 2 years ago

The list of egeria connectors:

    • audit-log-console-connector
    • audit-log-event-topic-connector
    • audit-log-file-connector
    • audit-log-slf4j-connector
    • avro-file-connector
    • basic-file-connector
    • cassandra-metadata-extractor-connector
    • cohort-registry-file-store-connector
    • configuration-encrypted-file-store-connector
    • configuration-file-store-connector
    • csv-file-connector
    • data-engine-proxy-connector
    • data-folder-connector
    • elasticsearch-integration-connector
    • graph-repository-connector
    • inmemory-open-metadata-topic-connector
    • inmemory-repository-connector
    • janus-connector
    • kafka-integration-connector
    • kafka-open-metadata-topic-connector
    • omrs-rest-repository-connector
    • open-lineage-janus-connector
    • open-metadata-archive-file-connector
    • openapi-integration-connector
    • readonly-repository-connector
    • security-manager-tag-connector
    • security-officer-tag-connector
    • spring-rest-client-connector

As a first step I'll remove dependency for elasticsearch-integration-connector and graph-repository-connector, and add jar-with-dependencies for them. And place the jars with dependencies in ./lib of distribution

planetf1 commented 2 years ago

Agree on making small incremental steps. I would suggest we don't add anything into 3.3 at this point, but target 3.4 after we branch next week. It will give time to ensure adequate review and informal testing.

The graph repo connector pretty much has non-egeria dependencies - though given the way the assembly works we will be duplicating them all into lib. This also needs doing in the gradle assembly. Should be Harmless except in size . I don't see it would cause issues with demos etc, but the docker image/helm charts should be tested as part of the change to ensure (specifically on the graph repo)

In the case of the elasticsearch connectoir we'll pull some libraries in too, which would benefit from the 'what is the chassis' / provided aspect of what is described above as this would remove the duplicated libraries

We also need to agree whether we'll go with copying into 'lib' of the distro or jar-with-dependencies in general. either should work when running the distribution but there are other pros/cons - we don't need to decide this right now and can go with the latter for now.

planetf1 commented 2 years ago

A simplification - we don't need to over-analyze what the server chassis consists of for now, instead:

first, Connectors .. (and other modules that run on the server chassis) can be built as jar-with-dependencies (as in the example you gave of the janus connector) - and I would probably do this across all connectors

At this point our build will grow substantially, and there can be some confusion debuggin, - though should be functional. Probably a good idea for beginning of release cycle.

Then we can incrementally

As we do this the build will shrink again.

Since we are always using 'server-chassis' as provided, that transitive dependency chain is always a true reflection of what is really provided for that release, so if something is removed that another module depends on, we'll automatically get a build error (symbol not found etc) & so be able to correct. ie it's adaptable.

If in future we can apply the same technique to OMASs, indeed for any modules that are not in the server chassis itself, but depend on that runtime and can assume it is there.

(We may need to add server-chassis-spring to the exclusion list for the dependency checker, or remote provided scope)