os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
20 stars 10 forks source link

link github team updates to trino configurations #89

Open erikerlandson opened 3 years ago

erikerlandson commented 3 years ago

@MichaelTiemannOSC asked if changes to team memberships could automatically trigger workflows. The use case was keeping team memberships and trino access control synchronized.

This is the list of categories of "event" that can trigger a github workflow: https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows

I don't see any triggers that directly map to "change to team members" but one could probably contrive an issue-based system that triggers a workflow which both updates team members and creates a pull-request to make some corresponding change to a trino policy config.

It might be advantageous to focus on how this could be accomplished in a world where we are using Ranger.

See also trino GroupProvider plugin concept, below.

erikerlandson commented 2 years ago

Another promising approach is to write a trino GroupProvider plugin that does what we want. xref: https://github.com/trinodb/trino/issues/9835

An example of such a plugin implementation, for LDAP, is here: https://github.com/arghya18/trino-group-provider-ldap-ad

An example of github rest api (this is exposing trino catalog, not a group-provider): https://github.com/nineinchnick/trino-rest/blob/master/trino-rest-github/src/main/java/pl/net/was/rest/github/GithubRest.java#L1172

Based on that example, a similar plugin that used github api instead of ldap api would be simple enough to be feasible.

erikerlandson commented 2 years ago

Trino's doc page, for completeness: https://trino.io/docs/current/develop/group-provider.html

erikerlandson commented 2 years ago

cc @caldeirav @rimolive @HumairAK

caldeirav commented 2 years ago

With this approach, file system access control implementation will primarily rely on group based catalog rules (for contributors). And we can use role-based access for example for internal operations / admin users once this is available: https://github.com/trinodb/trino/issues/9839

See my comments on the above issue, where the multiple GroupProvider approach could be used for federation purpose when multiple data providers leverage the platform to share data.

erikerlandson commented 2 years ago

I have created a skeleton repo for a GitHubGroupProvider: https://github.com/os-climate/trino-github-group-provider

It will build with mvn clean package but it only returns an empty Set<String>, so it needs the biz logic for actually talking to a github org and identifying team memberships to use for trino group names