karolusz / adls-acl

Manage Azure Data Lake Gen 2 ACLs as Code
MIT License
1 stars 0 forks source link

Azure DataLake Storage (ADLS)- Access Control List (ACL) Manager

A small CLI tool for managing Azure DataLake Storage (ADLS) Access Control Lists (ACLs) for containers and directories.

It allows you to take control of your ADLS account's directory structure and ACLs as Infrastructure as Code through the use of YAML configuration files.

Tests PyPI

Requirements

Install

pip

$ pip install adls-acl 

Usage

Command line

adls-acl can be run from the command line to create directories and set desired ACLs in the Azure Storage Account Gen v2 as defined in a user supplied YAML files.

Containers and directories defined in the config file, but not present in the storage account, will be created during adls-acl run. The ACLs for existing directories in the storage account, will be overwritten with those specified in the input config file. Future releases shall enable alternative behaviors. For that reason, the current version of adls-acl is best for green field deployments.

The Azure Identity client (Python SDK) is used for authenticating to Microsoft Entra ID (former Azure AD). It currently uses DefaultAzureCredential (MS DOCS: DefaultCredential), which enables authentication with multitude of methods (in the future a user will be able to target a specific authentication mechanism via a CLI option in adls-acl for better control).

Usage:

Usage: adls-acl [OPTIONS] COMMAND [ARGS]...

Options:
  --debug          Enable debug messages.
  --silent         Suppress logs to stdout.
  --log-file TEXT  Redirect logs to a file.
  --help           Show this message and exit.

Commands:
  get-acl  Read the current fs and acls on dirs.
  set-acl  Read and set direcotry structure and ACLs from a YAML file.

Options:

set-acl command

Usage: adls-acl set-acl [OPTIONS] FILE

  Read and set direcotry structure and ACLs from a YAML file.

Options:
  --auth-method [default|environment|workload|managedid|azurecli|azureps|azuredevcli]
                                  Azure AD Authentication method
  --auth-opt <TEXT TEXT>...       Keyword arguments to pass to Azure SDK
                                  credential constructor
  --help                          Show this message and exit.

Options:

To set acls from an input file test.yml the shell command would look like:

adls-acl set-acl test.yml

get-acl command

Usage: adls-acl get-acl [OPTIONS] ACCOUNT_NAME OUTFILE

  Read the current fs and acls on dirs.

Options:
  --omit-special                  Omit special ACLs when reading the account.
  --auth-method [default|environment|workload|managedid|azurecli|azureps|azuredevcli]
                                  Azure AD Authentication method
  --auth-opt <TEXT TEXT>...       Keyword arguments to pass to Azure SDK
                                  credential constructor
  --help                          Show this message and exit.

This will print the current filesystem of an account (directories only, no files) and their ACLs to a file on a path pass as OUTFILE argument. Options:

To read ACLs of a ADLS storage account named testaccount to file dump.yml:

adls-acl get-acl testaccount dump.yml

Input file

The YAML schema reference for the input files. Each input file represents a desired directory structure and ACLs for a single Azure Storage account.

Input File Example

Example of an input file for a fictitious storage account. All elements of the schema are explained in the following sections.

account: testaccount
containers:
  - name: testcontainer1 
    acls:
      - oid: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
        type: "user"
        acl: r-x
      - oid: "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
        type: "user"
        acl: --x
    folders:
      - name: directory_a
        acls:
          - oid: "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
            type: "group"
            acl: rwx
            scope: default
        folders:
          - name: subdir_a 
            acls:
              - oid: "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
                type: "user"
                acl: --x
      - name: directory_b
        acls:
          - oid: "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
            type: "group"
            acl: rwx
            scope: default

The above input would create the following directory structure in the storage account testaccount:

testcontainer1 (storage container)
root/
|
├── directory_a/
|   ├── subdir_a/
|
├── directory_b/

Account - definition

account: string # Required. The name of the Azure storage acccount.
containers: [ folder ] # list of containers in the account.

account string. Required. Azure Storage Account name as in: https://<account>.dfs.core.windows.net/

container folder, A list of objects describing directories and their ACLS. In the context of the container it defines container's name, ACLs on the container root and subdirectories.

Folder - definition

name: string # Required. Direcotry name.
acls: [ acl ] # A list of ACLs to set on the directory.
folders: [ folder ] # A list of subdirectory objects.

name string. Required. A name of a directory.

acls acl A list of ACLs to set on the directory.

folders folder A list of objects describing subdirectories.

Acl - definition

oid: string. # Required. Security principal Object ID in Microsoft Entra ID.
type: string # Required. Security principal type.
acl: string 
scope: string
recursive: bool

oid string. Required. Object ID of the principal (user/group/managed identity/service principal) in Microsoft Entra (former Azure Active Directory).

type string. Required. Type of the service principal. Allowed values: user (for users, service principals, and managed identities), group (for Entra ID groups), other (for all other users ACLs), mask (for setting masks on directories)

acl string. Required. A string defining desired permissions in the short form. MS DOCS: ADLS ACLs e.g.: r-- for read-only permissions

scope string. Optional If set to default it will set the specified ACLs as default ACLs MS DOCS: Types of ACLs. If not present, ACLs will be set as access ACLs.

recursive bool. Optional If set to True that ACL will be applied recursively to every subdirectroy and file inside the directory this ACL is to be set on.

Special ACLs

adls-acl also allows for managing ACLs for owning user, owning group, all other users, as well as setting masks. Examples of how to specify each of the above, in the adls-acl YAML input file (as acl block) are provided below:

Default ACLs

The default ACLs defined or set on the higher level directories are pushed down to subdirectories specified in the input file. They will not be set on any files that had existed in the directories prior to the execution of adls-acl. Moreover, any subdirectories that exist in the account but are not specified in the input file remain untouched by adls-acl.

Future releases will allow for more control over this behaviour (i.e, updating default ACLs on all files created prior to the change of ACLs).

Authentication Methods

The Azure Python SDK authentication is by default handled with DefaultAzureCredenial.

In addition, a user can target one of the supported Azure Python SDK, by using --auth-method option:

To pass keyword arguments to the credential constructs use --auth-opt option. This option can be used multiple times, one instance per keyword argument.

e.g. to pass managed_identity_client_id and exclude_cli_credential to DefaultAzureCredental:

--auth-method default --auth-opt managed_identity_client_id xxxx-xxxx-xxxxx --auth-opt exlcude_cli_credential False