Closed planetf1 closed 4 years ago
A quick summary from the call
I will continue the work from #3316 - update & remove clients as a next step
Feel free to add any further comments. cc: @davidradl @CDaRip2U @mandy-chessell
The PR #3316 was already nearly aligned with the discussion. I've now updated and is ready for review/merge
The above PR is now merged so closing this issue. Please reopen/open new issue for any followon comments.
Egeria Packaging
We have encountered a number of challenges with egeria assemblies, including a variety of issues documented in github: https://github.com/odpi/egeria/issues?q=is%3Aopen+is%3Aissue+label%3Aassemblies+
These include
This issue is open to point out some of the challenges, look at what we're creating in egeria, and start a discussion on what we do moving forward
Initial discussion: ODPI call Thu 6 Aug 1400 UTC+1
Artifact types
We have various types of artifacts generated for Egeria - not speaking in terms of implementation poms, but principle:
Web Applications
These are full fledged application containers. They will run as a process (often via JVM)
Both are spring-boot based, and currently these are created as executable jars, with embedded tomcat. This makes them easy to launch, to run within a container etc.
Example of server chassis:
These would be typically 'run' as packaged, or there could also be a desire to run in an existing web container. In the former case they need to be fully self-contained. In the latter all except the web server runtime is needed. In either case the egeria code needs all it's dependencies present.
Most developers are launching this directly from the output of the chassis build, a few using assemblies, those just learning about egeria generally via docker containers.
Client libraries
Third party applications (consumers) may use our Java Client libraries. For example we have an 'asset owner' OMAS which allows the caller to create and update assets. Part of this OMAS is the client code which will mask the complexity of building the required REST body to send to the server chassis & marshall the response.
These applications may wish to
Users of these libraries need to know what dependencies they have, and ideally resolving these should be easy.
Maven and gradle will automate this management for the user - which is very much the intent for these tools in the java space
If they chose to use jars directly they may find it useful to have dependencies documented or provided easily. If we wish to include clients within a distribution we need to consider how they are consumed. Just supplying the fine grained client jar is not sufficient, as there will be many dependencies.
Either we need to build an uber-jar/shaded jar with dependencies, or we need to include the necessary dependencies by an analysis of their dependencies (usually managed through maven or gradle plugins)
Questions
Connectors
We have a few different kind of connectors:
these get used in a variety of ways
using connectors in our servers
These extend the capability of the egeria server platform.
For example connectors for the audit log framework to log to disk.
Many of the dependencies needed by the connector may already be present in the server chassis, but not all - in particular technology specific libraries. Resolution of what is missing needs to be automated, or at least simple
One option is to define the 'bom' (bill of materials) for the chassis, and assume this is always 'provided'. Then when packaging either the connector, or building the assembly, we 'subtract' these from the list of dependencies we package.
using connectors in our clients
These are used by clients.
Examples include JDBC, csv, avro files
These are somewhat similar to our client libraries - so can we provide them in the same way? Potentially every dependency is needed
A few examples
Utilities
Similar to applications, these are generally expected to be run directly, for example to create a json types archive, or import from the Cloud Information Model:
Currently these are built as uber jars, so everything is in one jar, but as we grow the number of utilities, the overal size will increase rapidly with much duplication. Is the ease of use worth it? Or do we need a utility 'tree' in the distro where we can just include all the dependencies once?
Samples
Samples can cover a broad spectrum. They could be a simple hello world app, which may not be that interesting for anyone to just run, but instead is a guide on how to build an application, pick dependencies. However others may, for example, import coco pharmaceuticals data
There is some value in having samples ready-to-run, in which case we can almost consider them utilities once built -- their characteristics are the same.
Source code
I see this as everything - prior to any build process. Basically what we have in git. So not just java source, but sample data files, docs etc. There's likely little value in building anything much different to an entire git archive -- whilst the current source assembly is more manually crafted and thus subject to error.
Each maven artifact does need to contain it's own source -- this is mandated by maven central, and makes debugging easier as IDEs will automatically look for the 'source' classifier of the maven packages
So any source 'assembly' is of limited value
Documentation
Human readable useful info. Mostly present in the source tree in markdown format, but also processed and published on github. More generally generated documentation - be it of APIs, test results, scans may also feature.
This also includes javadoc which, as with source, is included with another classifier in each maven artifact so that IDEs can offer developers help with the API.
Producing javadoc as part of the build is likely useful for developers working on egeria itself, and ideally publishing this for each version could aid more casual users & search, but an assembly per-se doesn't offer a whole lot.
Egeria Modules
In egeria we have a very fine-grained module structure. Currently we are at over 320 modules. For example the asset owner client is one module, whilst the asset owner server is another module. Each may have differing dependencies which are policied to be sufficiently minimal (not referring to dependencies we don't use, and all we use must be specified (or transitive))
This works well for building against maven, but we can see above can cause other challenges.
We could create simplified packages with much more standard prereqs ie for clients (ie a large client 'bom'), but this also makes validation checks harder
What do we have currently
Currently we focus on one main assembly which is 'laid down' on disk as we would expect it to be run -- this in particular includes the server connectors which we add into lib - though since our current code hard-wires these dependencies, in most cases we don't necessarily end up dynamically loading much -- yet (there is an issue open to improve this)
And in server
There are two notable ways of including dependencies in the jars:
Both result in something that is self contained, but if we have many of these with similar dependent libraries, the size explodes
Let's look at those uber-jars in clients (noting that this list is incomplete)
And for our uber jar connectors:
Compose runtime tree
This is how I tried to make the current assembly framework, but unfortunately dependency management in the assembly only appears to work with project dependencies (an aggregate of everything we need in the entire assembly) not on individual module (issues/mailing lists posted to) so it gets to over 400 dependencies:
The principle would be to add dependent libraries alongside our egeria artifacts that need them - for example a client OMAS
Simplification of our packaging
Currently we have a 1:1 relationship between our build modules & what gets published to maven central.
We could either
docker
We also build an egeria docker image. Most of the egeria content for this comes from the full distribution assembly above, and also a smaller egeria assembly we create containing our notebook/tutorials
The docker image is built towards the end of the maven build process and published to maven central
Helm charts & Kubernetes operators will make use of the published images
presentation server (node)
We have a node app for the new UI (presentation server) that is based on node. The dev team are still working on this & we need to do more work to determine what deploying it means.
We do have an initial process in place which adds the required content into an additional docker image