wjohnson / pyapacheatlas

A python package to help work with the apache atlas REST APIs
https://wjohnson.github.io/pyapacheatlas-docs/latest/
MIT License
170 stars 96 forks source link

Column Mapping is not created #287

Closed tsitsimis closed 6 months ago

tsitsimis commented 6 months ago

Describe the bug Column mapping of custom process is not shown in Purview portal

To Reproduce

import json

from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core.typedef import EntityTypeDef, AtlasAttributeDef
from pyapacheatlas.core import PurviewClient
from pyapacheatlas.core.util import GuidTracker
from pyapacheatlas.core import AtlasEntity, AtlasProcess

# Authenticate
oauth = ServicePrincipalAuthentication(
    tenant_id="",
    client_id="",
    client_secret=""
)
client = PurviewClient(
    account_name="",
    authentication=oauth
)

# Create custom process type
my_test_process = EntityTypeDef(
    name = "my_test_process",
    superTypes = ["Process"],
    attributes = [
        AtlasAttributeDef("columnMapping")
    ]
)
client.upload_typedefs(entityDefs=[my_test_process], force_update=True)

guid_tracker = GuidTracker()

# Create entities
input1 = AtlasEntity(
    name="test_input1",
    typeName="DataSet",
    qualified_name="custom://test_input1",
    guid=guid_tracker.get_guid()
)

input2 = AtlasEntity(
    name="test_input2",
    typeName="DataSet",
    qualified_name="custom://test_input2",
    guid=guid_tracker.get_guid()
)

output1 = AtlasEntity(
    name="test_output1",
    typeName="DataSet",
    qualified_name="custom://test_output1",
    guid=guid_tracker.get_guid()
)

column_mapping = [
    {
        "DatasetMapping": {"Source": input1.qualifiedName, "Sink": output1.qualifiedName},
        "ColumnMapping": [
            {"Source": "in1_address", "Sink": "out1_address"},
            {"Source": "in1_customer", "Sink": "out1_customer"},
        ],
    }
]

process = AtlasProcess(
    name="my_test_process",
    typeName="my_test_process",
    qualified_name="custom://my_test_process",
    inputs=[input1, input2],
    outputs=[output1],
    guid=guid_tracker.get_guid(),
    attributes = {"columnMapping": json.dumps(column_mapping)}
)
client.upload_entities(batch=[input1, input2, output1, process])

When I go in Purview portal, the left menu does not show the column: image

Expected behavior Something similar to this picture would be expected: https://github.com/wjohnson/pyapacheatlas/issues/165#issuecomment-920141829

Desktop (please complete the following information):

Thank you!

tsitsimis commented 6 months ago

After some digging and based on the comment here https://github.com/wjohnson/pyapacheatlas/issues/165#issuecomment-920141829 I found out that the right parameter name of the EntityTypeDef class is attributeDefs and not attributes. I was confused by the example in this tutorial. Maybe an opportunity to update documentation. Thank you anyway, the package is very useful