wjohnson / pyapacheatlas

A python package to help work with the apache atlas REST APIs
https://wjohnson.github.io/pyapacheatlas-docs/latest/
MIT License
171 stars 97 forks source link

PyApacheAtlas: A Python SDK for Azure Purview and Apache Atlas

PyApacheAtlas Logo

PyApacheAtlas lets you work with the Azure Purview and Apache Atlas APIs in a Pythonic way. Supporting bulk loading, custom lineage, custom type definition and more from an SDK and Excel templates / integration.

The package supports programmatic interaction and an Excel template for low-code uploads.

Using Excel to Accelerate Metadata Uploads

Using the Pythonic SDK for Purview and Atlas

The PyApacheAtlas package itself supports those operations and more for the advanced user:

Quickstart

Install from PyPi

python -m pip install pyapacheatlas

Using Azure-Identity and the Azure CLI to Connect to Purview

For connecting to Azure Purview, it's even more convenient to install the azure-identity package and its support for Managed Identity, Environment Credential, and Azure CLI credential.

If you want to use your Azure CLI credential rather than a service principal, install azure-identity by running pip install azure-identity and then run the code below.

from azure.identity import AzureCliCredential

from pyapacheatlas.core import PurviewClient

cred = AzureCliCredential()

# Create a client to connect to your service.
client = PurviewClient(
    account_name = "Your-Purview-Account-Name",
    authentication = cred
)

Create a Purview Client Connection Using Service Principal

If you don't want to install any additional packages, you should use the built-in ServicePrincipalAuthentication class.

from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core import PurviewClient

auth = ServicePrincipalAuthentication(
    tenant_id = "", 
    client_id = "", 
    client_secret = ""
)

# Create a client to connect to your service.
client = PurviewClient(
    account_name = "Your-Purview-Account-Name",
    authentication = auth
)

Create Entities "By Hand"

You can also create your own entities by hand with the helper AtlasEntity class.

from pyapacheatlas.core import AtlasEntity

# Get All Type Defs
all_type_defs = client.get_all_typedefs()

# Get Specific Entities
list_of_entities = client.get_entity(guid=["abc-123-def","ghi-456-jkl"])

# Create a new entity
ae = AtlasEntity(
    name = "my table", 
    typeName = "demo_table", 
    qualified_name = "somedb.schema.mytable",
    guid = -1000
)

# Upload that entity with the client
upload_results = client.upload_entities( [ae] )

Create Entities from Excel

Read from a standardized excel template that supports...

See end to end samples for each scenario in the excel samples.

Learn more about the Excel features and configuration in the wiki.

Additional Resources