trinodb / aws-proxy

Proxy for S3
Apache License 2.0
7 stars 3 forks source link

Enhance plugin mechanism #118

Closed mosiac1 closed 1 month ago

mosiac1 commented 1 month ago

This was missed in the original PR https://github.com/trinodb/aws-proxy/pull/111.

Tests are not directly using TrinoAwsProxyServerModule so we don't test if the right modules are installed.

Randgalt commented 1 month ago

We shouldn't add this generically. It's a special use-case. We can have a test that installs it.

Randgalt commented 1 month ago

I missed this in the original PR. The CredentialsProvider is not being bound correctly. It should use TrinoAwsProxyBinder so that it's bound via the Optional binder.

mosiac1 commented 1 month ago

Updated the module to use TrinoAwsProxyBinder.

On the question of adding it or not - looking at it again i realise adding makes the credentials-provider.credentials-file-path config required, so that is not good.

We do need some of way enabling this simply from configs. I think our opinions diverge a bit on this point, but I would like the trino-aws-proxy to be useable (to some extent) as is; you should be able to build it, setup some configs and run it.

I'm thinking we can add a credentials-provider.type=... config switch and bind different configs/implementations based on that. This could be the file type. Another type we will likely need is http. We can make this default to none which doesn't bind anything, but that would be a bit weird IMHO since it makes the aws-proxy practically a service that just responds with 401.

WDYT?

Randgalt commented 1 month ago

The problem with a config based switch is that it can't deal with types it doesn't know about. What I suggest is alternate Mains. A while back I was playing around with turning TrinoAwsProxyServer into a builder. Another alternative is to add a new module, something like trino-aws-proxy-standalone that binds the FileBasedCredentialsProvider and anything else needed to run it "as is". My only reservation about that is that is it useful for anything other than demoing?

Or maybe just put an alternate Main class in the main module: TrinoAwsProxyServerStandalone - that would be helped by changing TrinoAwsProxyServer into a builder.

mosiac1 commented 1 month ago

Put together a "demo" of how I envision this working.

Plugins bind CredentialsProvider implementations annotated with the type as a string. We then "conditionally" only build the implementation annotated with the type from credentials-provider.type. Each binding brings its own configuration. Plugins should use TrinoAwsProxyBinder#credentialsProviderModule (this currently conflicts with bindCredentialsProvider from the same interface, I need to think what to do with that).

My only reservation about that is that is it useful for anything other than demoing?

We would most likely be using this in the Docker image we deploy. Obvs. the file-based one is for basic dev deployments that we are working on now to test integration with other systems. But an http credentials provider should easily slot in.

The idea of fully configuring the aws-proxy with code is interesting, it reminds me a bit of how Apache Superset does configuration if you've ever worked with that project. Its python so the context is a different, but you give it a superset_config.py file where you set value for magic variables - values can be string, string read from env, objects, classes, anything else in python.

I have to think a bit more about this, but at this moment I'm sure it would incur more work for us to use this approach internally. Once the aws-proxy is past version 1 and we will only be going from release to release it should be more straight-forward.

vagaerg commented 1 month ago

Made a draft on behalf of @mosiac1 , this is currently his proposal for how he envisions this could be implemented

vagaerg commented 1 month ago

Had a chat with @mosiac1 on this topic, I understand not wanting to pollute the codebase by introducing more complex plugin loaders (as we originally discussed in https://github.com/trinodb/aws-proxy/pull/111#discussion_r1682977222 ); but at the same time I think we need to provide a clear way to run the proxy without requiring users to implement their own main methods.

I agree the proxy can have a clean & nice builder which makes building your own main method simple - and in fact think it's a great idea. But IMHO we should not require all users to write their own entrypoints, particularly for the most common use cases.

Another alternative is to add a new module, something like trino-aws-proxy-standalone that binds the FileBasedCredentialsProvider and anything else needed to run it "as is". My only reservation about that is that is it useful for anything other than demoing?

I believe it would be useful for many users. Requiring users to implement their own main functions results in them needing to have a Java build process/CI, requires code review and security scanning to track potential unexpected dependencies coming from this function. It may also become a maintenance burden, particularly for users that may not have well established Java infra.

These users could, if we had a fully config-driven setup, happily pull a prebuilt artifact or even a container image. Those options become harder if we require them to implement their own main method.

(I know we can find solutions to all the problems I listed above, but the question is whether it's worth figuring that out instead of allowing the proxy to run with just config)

I understand this is not an immediate priority, but I think it'd still be a nice improvement to the project - we're happy to work on this of course, but first we'd like to get your / others opinions on the subject.

Randgalt commented 1 month ago

I'll try to come up with a prototype of what I had in mind so at least it can be compared/contrasted.

Randgalt commented 1 month ago

Here's what I was thinking of for the "standalone" server: https://github.com/trinodb/aws-proxy/pull/121 - users don't need to write their own main. I'll work on a separate example of how I thought about using the plugin mechanism to run a server using plugins.

...

Here's the plugin example: https://github.com/Randgalt/plugin-example

Notice it only defines implementations for the interfaces it wants to provide for. Then it adds a plugin instance and the services file. The idea here is that this JAR would be added to the Airlift launcher lib directory of an already built Trino AWS Proxy.

mosiac1 commented 1 month ago

Based on @Randgalt's plugin example I created https://github.com/mosiac1/aws-proxy-plugin-example to show this would work. The project compiles as a JAR that needs to be in the aws-proxy classpath.

Randgalt commented 1 month ago

Don't forget to squash please