This PR aims to support quoted object fields name for Iceberg table ingestion. The PR include following change:
Preserve original column name in schema: Decode column name after schema creation to avoid sub-column name mismatch. As we use TypeToMessageType to convert from Iceberg schema to parquet schema which include Avro column name encoding for all non-digit/letters characters.
Escape dot character in EP info keys: Escape the dot character in column name of a dot path with backslash. Without this, it's possible for two different columns to have the same dot path. E.g. ("a.a" int, a object(a int)).
Change stats map's key to field id: The old logic build dot path as key of stats map along with structured data type validation, which might cause performance issue. Remove this logic and use fieldId as key instead to avoid string construction. Keep a map of fieldId -> dotPath in subcolumnFinder for logging purpose.
This PR aims to support quoted object fields name for Iceberg table ingestion. The PR include following change:
("a.a" int, a object(a int))
.fieldId
as key instead to avoid string construction. Keep a map offieldId -> dotPath
insubcolumnFinder
for logging purpose.