trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.18k stars 2.94k forks source link

Improve packaging #22597

Open mosabua opened 2 months ago

mosabua commented 2 months ago

The current binary packages provided by the Trino suffer from a number of issues.

These issues will get worse in the near future with more connectors being merged and also more native binaries for multiple operating systems and processor architectures being required and included.

This roadmap items collects a number of related work tasks that we want to engage on. Numerous discussions took place prior to filing this issue on slack, in smaller conversations, and at the Trino Contributor Calls and Congregation.

Following are a number of sections that details tasks and ideas. Work on these can be done in parallel.

Pull out RPM

The RPM is rarely used by now and we agree on removal of it from the core trino repo. Since it is build from the tarball however it is possible to pull the rpm packaging aspects out of the core repo into a separate repository that users can use to build an RPM for any Trino version.

The repository https://github.com/simpligility/trino-packages is a first PoC implementation of this approach. The naming is generic since it can also be used for other package creation in separate modules.

Tasks to implement the removal are:

Figure out plugins for different packages

We need to determine what different packages we want to offer and what each package should contain. Following is a first idea

Different tarballs

Once we figure desired plugins and archive variants, we should adjust the build and publishing process to publish these and update docs as well

We also then need to add docs on how to download and add additional plugins.

Different container images

Once we figure desired plugins and archive variants, we should adjust the build and publishing process to publish these and update docs as well

We also then need to add docs on how to download and add additional plugins.

Plugin loading

Over time it might be even better to be able to define a URL or similar pointer to a running system and then load that plugin onto the servers and run it. of course security concerns and other aspects need to be figured out. API for these operations could (or maybe even should) be SQL command similar to the dynamic catalog management features.

martint commented 3 weeks ago

cc @nineinchnick

bitsondatadev commented 2 days ago

@mosabua,

Just another devex thing we may consider to at least track in an issue, is to build a web app akin to this Spring Boot one where you have a nice interface that hits a service to download a tar file with a custom list of plugins.

We could even make it a rest endpoint on a GitHub action that takes in a list of plugins and trino version and returns the tar file with just those plugins. Then we can just have minimal with in-memory (possibly some local filesystem if we want to replicate the DuckDB experience).

This isn't quite as dynamic as the first one you suggested, but it is a bit more secure and if we make the github app a repository, then a company could fork it into their own build system to stay behind a vpn.

mosabua commented 2 days ago

@bitsondatadev that is totally also something that's nice to have once we've done the preparatory work in terms of creating a minimal package and other things as proposed in this ticket. I think it's probably within scope here