stackabletech / airflow-operator

Stackable Operator for Apache Airflow
Other
22 stars 2 forks source link

Airflow OPA Authorization #446

Open adwk67 opened 6 months ago

adwk67 commented 6 months ago

Issue checklist

Possible duplicate and/or overlapping issue.

As an administrator I'd like to be able to centrally authorize actions my users are taking using (ideally) OpenPolicyAgent. However, as documented in this ticket, Airflow - like Superset - is built on Flask and as such offers its own user/role authoriziation, or Flask-related mechanisms.

Airflow does not support Open Policy Agent, which is what we use wherever possible. Instead, it delegates access control of the webserver UI to Flask directly and offers the following authentication types:

Airflow ships with a number of default roles and it is advised to leave these unaltered. LDAP offers authorization (via group membership) as well as authentication and is probably the most suitable way of implementing Airflow authorization, where appropriate, via Flask. It should be verified that the Flask search filters enable recursive mapping through group memberships.

siegfriedweber commented 1 week ago

Approach

Authorization with OPA can be implemented for UI users:

Background

The security model of Airflow involves different types of users (see Airflow Security Model):

This ticket covers only the "Authenticated UI users".

Airflow uses the Flask App Builder (FAB) but the security model is decoupled from it, see AIP-56 Extensible user management. Airflow Core only calls the AirflowSecurityManagerV2 and in turn the abstract BaseAuthManager. There exist auth providers which derive these classes. The default auth provider is the FAB provider, containing the FabAuthManager and the FabAirflowSecurityManagerOverride. The responsibility of the security manager is mostly authentication and role assignment. The authorization is done by the auth manager. FAB synchronizes and stores all users, roles and permissions in the database.

The idea for using OPA with Airflow is to re-use the authentication part of the FAB provider and just replace the authorization part. This has the advantage that the existing authentication methods are not affected. However, the roles and permissions should not be read anymore from the database but the request should be delegated to OPA. This means that the is_authorized_* functions in the FabAuthManager must be overriden. The input for the Rego rule would look as follows:

{
    "user": {
        "id": "test-user",
        "roles": ["test-role"],
    },
    "action": "DELETE",
    "resource-type": "DAG",
    "resource-details": {
        "id": "my-dag-id",
        "tags": ["example1", "example2"],
        "dag-folder": "/dags/marketing",
    },
}

The default roles and permissions (see Access control) must then be replicated in Rego. If this is not desired because e.g. they can change between Airflow versions, then these permissions could also be added to the input of the Rego rule resulting in large requests.