trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.25k stars 2.95k forks source link

MongoDB connector to have configurable database for _schema collection #8887

Open academy-codex opened 3 years ago

academy-codex commented 3 years ago

In production systems, it is not always viable to provide R/W access to a DB to a system user. Hence, we need to have a way to configure a different DB through catalog properties where the trino mongodb connector can R/W the _schema collection for the different collections in different databases in a mongo cluster. This DB may have R/W access because it does not contain any of the user data.

Note: Although, mongodb does provide collection level access control, it is a common practice in large enterprises to have a different higher level authorisation layer and often this layer has limitations to provide access at collection level. Mostly, access is given at database level. Otherwise, as well, this looks like a good to have feature on the books.

academy-codex commented 3 years ago

@ebyhr @hashhar @findepi In our organisation, we need to implement this. Does this look like something we can give back too ?

sumannewton commented 3 years ago

New database dedicated to Trino for maintaining metadata is a valid and useful point. Because it’s acceptable to provide write permissions to Trino on database that is dedicated for Trino itself than having write access on the entire cluster or all databases.

hashhar commented 3 years ago

@academy-codex This sounds like a useful feature to have due to the reasons you've mentioned about permissions management.

There are a few questions we should decide on before implementing this:

academy-codex commented 3 years ago

@hashhar The answer to the second question, I would recommend that the user creates the database and not trino. We enforce a check like the name should start with trino_ or something and let the user take care of provisioning the database. Because its not just a matter of creating a database, it also means whether or not to create index on table name, etc. which can be said as admin operations. Creating the database i would let the user handle it.

The answer to the first question, I feel we can either let the connector recreate schemas on the fly as it does, or since the user is in charge of creating the new database, he can copy over the data from old table to new table. Again, this can be said as a admin task (usually a one time task) and ideally i wouldn't handle this programatically but let the database admin take care of it.

What do you think ?

hashhar commented 3 years ago

For the first question I agree with your solution - let the user be responsible for copying the data from old database to new if they change the config.

For the second - does it matter if we allow Trino to try to create the database/indexes as long it has the permissions? If the user forgot to create the database (or Trino doesn't have permissions) it'll fail like it already does today. If the user did create the database (or Trino has the permissions) it'll work as it does today.

I can't see the difference - only the error messages may be different (lack of permissions vs something not existing). Also, let's not enforce any naming conventions.

Agreed with the rest of the points.

academy-codex commented 3 years ago

Yes i agree with you. My point is basically centered around compliance heavy firms, basically would we ever let a system user have such permissions in production systems ? I may be wrong, but ideally i would not let my system user have anything other than basic Read Only. But yes, from a generality point of view doesnt matter. We should go with general approach, i see your point.

Let me take this ticket up.

hashhar commented 3 years ago

@academy-codex Yes, in locked down environments Trino would fail with an error. We should make sure the error message is actionable (i.e. it should say something like "create this schema or add permissions"). The user can then decide what to do depending on their security posture.

This way I think both the general and the security-sensitive use-cases are addressed. WDYT?

Thanks for picking this up.

academy-codex commented 3 years ago

Agreed :)

ebyhr commented 1 year ago

Let me close this issue as #8956 was closed due to inactive status and I suppose the demand of this feature is very limited. Please feel free to reopen if needed.

Gunnnn commented 10 months ago

Upvote for this feature! My PoC for Trino copying MongoDB failed cause PROD system isn't accessible for writing.