Open ghosalya opened 10 months ago
Hey @ghosalya!
Firstly, thank you for opening this ticket and for the comprehensive description you've provided!
The issue you're encountering stems from the combination of a dataset and a DEG. In instances like these, the ODD Platform prioritizes the lineage of the DEG. Moreover, during metadata ingestion, ODD Platform doesn’t cross-check against these specific classes and permits the creation of such combinations.
For us to address this effectively, could you shed some light on the rationale behind designating an entity as both a dataset and a DEG simultaneously? It's essential for us to grasp the underlying intentions so we can determine the best path forward and ensure that creating a DEG and dataset within the same entity is indeed meaningful
Hi @DementevNikita
For us to address this effectively, could you shed some light on the rationale behind designating an entity as both a dataset and a DEG simultaneously? It's essential for us to grasp the underlying intentions so we can determine the best path forward and ensure that creating a DEG and dataset within the same entity is indeed meaningful
This is one of the workarounds we are trying with https://github.com/opendatadiscovery/odd-platform/issues/1407
Essentially, we want a DataEntity that is a DataSet (i.e. WIDGET_TABLE), but also has a component that lists the versions of this dataset (WIDGET_TABLE_V1, WIDGET_TABLE_V2). In this case, I would like the lineage of WIDGET_TABLE to derive from its DataSet lineage, since it is first and foremost a table.
Describe the bug
A Data Entity that is both a DataSet and a DataEntityGroup loses it's lineage information regarding the dataset, and only have lineage from the DataEntityGroup.
Set up
ODD-Platform v0.15.0 (ghcr.io/opendatadiscovery/odd-platform:0.15.0)
Steps to Reproduce
There is a code to reproduce this behavior: https://gist.github.com/ghosalya/aa25b2903d3d5bf728a8b8aad9731cec It uses
odd-models-package
to call the Ingestion API to create Data EntitiesSteps to reproduce the behavior:
Have
odd-platform
running at http://localhost:8080 (I followed this section of README.md for docker)Go to http://localhost:8080, and create a collector (
Management
->Collectors
->Add Collector
). Export the toke as env variableODD_PLATFORM_TOKEN
Install
odd-models-package
Run odd_widget_example.py from the gist. This will create a number of entities e.g. WIDGET_TABLE
Go to http://localhost:8080, look for WIDGET_TABLE dataset and check the![image](https://github.com/opendatadiscovery/odd-platform/assets/12974269/289e600c-1126-4bd1-b4aa-6469f82e60da)
Lineage
tab. It should showwidget_job -> widget_table
lineageNow run odd_widget_example_deg.py, this will modify WIDGET_TABLE to have a DataEntityGroup component
Go to http://localhost:8080, look for WIDGET_TABLE dataset; it should have a DEG component like so![image](https://github.com/opendatadiscovery/odd-platform/assets/12974269/d460b22f-3831-4938-8437-18ad876a7a46)
Go to WIDGET_TABLE's
Lineage
tabExpected behavior
The
Lineage
tab should still showwidget_job -> widget_table
Current behavior
The
Lineage
tab is overridden by the DEG component and only shows the DEG members, and we lose the original lineage.Additional context
The code to submit data entity list uses
odd-models==2.0.31