open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.6k stars 1.05k forks source link

Sample Data Ingestion: Can not ingest tables with complex data types #16983

Open nqvuong1998 opened 4 months ago

nqvuong1998 commented 4 months ago

Is your feature request related to a problem? Please describe. When ingesting sample data from Hive tables using Trino, we encounter an error: "Error trying to ingest sample data for table" when dealing with tables that have complex data types.

Describe the solution you'd like There are 2 solutions:

  1. When displaying sample data from Hive tables with complex data types such as struct, map, and array, it should match the schema structure.
  2. Convert complex sample data to a JSON string, and display it in one column, you can use a JSON representation for each row's complex data types.
nqvuong1998 commented 4 months ago

cc @ayush-shah

chuqbach commented 4 months ago

Same issue here, it seems like the complex data type is not processed in OpenMetadata. Hue/Impala connector doesn't even process the complex data type at all, while Trino only processes it for Schema but not for Sample data/Data Profiler.

sushi30 commented 3 months ago

probably related to https://github.com/open-metadata/OpenMetadata/issues/15627

nqvuong1998 commented 2 months ago

Hi @TeddyCr @ayush-shah @harshach , any update for this issue?

ayush-shah commented 2 months ago

Hello @nqvuong1998 we will discuss internally and see which release can it be a part of. Until then, it would be great if you can provide us with the DDL of the table. Also, as it's open source we encourage people to contribute, let us know if you want to contribute, we will help wherever needed, Thanks 🙏

TeddyCr commented 1 month ago

@nqvuong1998 can you share OpenMetadata version you are on and any logs you have as well as the table DDL? We could not reproduce it on our end and JSON/STRUCT field for sample data are ingested as expected

nqvuong1998 commented 1 month ago

Hi @TeddyCr ,

TeddyCr commented 1 month ago

Can you share the full log files (if you can run it with Debug that would be helpful). Feel free to DM it to me in our slack channel. I see 3 errors in there -- would be interested to see what it is.

nqvuong1998 commented 1 month ago

Hi @TeddyCr @ayush-shah ,