trinodb / trino-gateway

https://trinodb.github.io/trino-gateway/
Apache License 2.0
164 stars 73 forks source link

Hide trino url from user #354

Open Chaho12 opened 6 months ago

Chaho12 commented 6 months ago

What do you guys think of hiding trino cluster url from user in future? We want users to only access gateway, so that we can control all queries submitted to trino. As of now, user can find trino url from various places.

xkrogen commented 6 months ago

I don't think we can do this unless the GW UI has full parity with whatever is exposed from the Coordinator UI. This includes all of the debugging/runtime/performance information you can get from the Coordinator's query UI.

I haven't explored the new GW UI in depth, but I don't think it exposes this level of detail -- and I'm not sure it would make sense to try to recreate all of this rich information in a copy of the coordinator UI.

If the goal is just preventing users from submitting queries directly, you should be able to do this via an access control plugin that denies inbound query requests that don't originate from the Gateway?

mosabua commented 6 months ago

This might make sense for the query client usage, and maybe over time even for the UI .. if we figure out a way to forward everything through the Trino Gateway.

Overall the goal to have all traffic go through the Trino Gateway makes sense to me, however it might a a longer road to get there. For starters we could reduce exposure of that info to where it is necessary only, and maybe limit it to admin users or so.

Also as @xkrogen mentions.. just because users can see and find a Trino cluster URL somewhere doesn't mean they can access it. Documenting how users can set this up might be a good idea as well.

Chaho12 commented 6 months ago

@xkrogen what i meant was that the Coordinator's query UI would also be routed so you would expect to see the same info, except that now the domain would be https://gateway-url.com/ui/ instead of https://coordinator-url.com/ui/. I just wanted to hide trino url from user.

you should be able to do this via an access control plugin that denies inbound query requests that don't originate from the Gateway?

can you elaborate more on this? i didn't know that we can set denying inbound requests.

Documenting how users can set this up might be a good idea as well.

i'll do when i find more about this denying requests :)

xkrogen commented 6 months ago

can you elaborate more on this? i didn't know that we can set denying inbound requests.

Depends on how your authentication is configured. Let's look at these three authenticators from Plugin: https://github.com/trinodb/trino/blob/22e9539a8a0d1611426cb72ac8b02c137330b836/core/trino-spi/src/main/java/io/trino/spi/Plugin.java#L78-L91

With a certificate authenticator, you could reject certificates that don't have a principal matching your GW instance, to prevent clients from talking to the coordinator directly. Similarly with a header-based auth, you should be able to use the headers e.g. client_ip to determine request origination. Looks like password authenticator doesn't provide any mechanism since you only get username/password.

So with current Trino plugin interfaces, YMMV in terms of whether and how you can achieve this based on your internal network environment and auth stack, but my general commentary is that if you want to enforce that clients don't talk directly to the coordinator, that should be done on the coordinator itself, not in the GW.

Coordinator's query UI would also be routed so you would expect to see the same info, except that now the domain would be https://gateway-url.com/ui/ instead of https://coordinator-url.com/ui/

Sure, if we set up the GW to have the ability to proxy through to the coordinator UI, this could work and would be simpler from the perspective of exposing information to users. If you really want to hide individual coordinators from users, then you want all access proxied through the GW, which would include result streaming (which currently happens directly from coordinators). Note that in your proposal I don't think you can just have gateway-url.com/ui -- you need gateway-url.com/cluster1/ui, gateway-url.com/cluster2/ui, etc. Right?

Chaho12 commented 6 months ago

Thx for explanations. I'll look more into trino plugin.

Note that in your proposal I don't think you can just have gateway-url.com/ui -- you need gateway-url.com/cluster1/ui, gateway-url.com/cluster2/ui, etc. Right?

Nope. As of now, user can simply use gateway-url.com/ui which redirects to adhoc (as there isn't any info on which cluster to route), but if it is sth like gateway-url.com/ui/query.html?20240523_231600_08933_2vgtx then it is routed to appropriate cluster based on queryid. Unfortunately, it is not 100% working perfectly yet as trino web ui js resource requests don't have a queryid, so proxying is not working as expected.

Chaho12 commented 6 months ago

@xkrogen i looked into SPI code, especially to HeaderAuthenticator module, and i notice that if there are multiple authentication methods, auth sucess in any method would allow user to access anyway. It is not AND condition but OR to say.

Unfortunately, we already have kerberos, password method so i don't think using API to prevent unwanted IP users would work :( Thx for the idea though! I now understand better about security/authority part.