onehouseinc / LakeView

Monitoring and insights on your data lakehouse tables
Apache License 2.0
22 stars 7 forks source link

Table doesn't show table name. It shows the S3 bucket name. #82

Open alberttwong opened 4 months ago

alberttwong commented 4 months ago

https://cloud.onehouse.ai/c3eb3868-6979-41cd-9018-952d29a43337/data/lakes/iceberg/databases/taxis

Screenshot 2024-07-16 at 10 13 19 AM

root@spark:/opt/lakeview# cat iceberg.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: delta
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        - lake: iceberg
          databases:
            - name: taxis
              basePaths: ["s3://warehouse/taxis"]
        # Add additional lakes and databases as needed
root@spark:/opt/lakeview# java -jar LakeView-1.0-SNAPSHOT-all.jar -p '/opt/lakeview/iceberg.yaml'
17:11:51.495 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
17:11:51.644 [main] INFO  com.onehouse.RuntimeModule - Spinning up 70 threads
17:11:51.985 [main] INFO  com.onehouse.metrics.MetricsServer - Starting metrics server
17:11:51.996 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:11:51.996 [main] INFO  c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:11:51.997 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:11:52.071 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/taxis
17:11:52.187 [metadata-extractor-3] INFO  c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: [Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=null)]
17:11:52.189 [metadata-extractor-3] INFO  c.o.m.TableMetadataUploaderService - Fetching checkpoint for tables: [Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d)]
17:11:52.471 [metadata-extractor-3] INFO  c.o.m.TableMetadataUploaderService - Initializing following tables [Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d)]
17:11:52.784 [metadata-extractor-4] INFO  c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:11:52.803 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - Processing 1 instants in table Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d) timeline COMMIT_TIMELINE_TYPE_ARCHIVED sequentially in 1 batches
17:11:52.803 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - uploading batch 1 for table Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:11:53.185 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:11:53.203 [metadata-extractor-3] INFO  c.o.m.TimelineCommitInstantsUploader - Processing 3 instants in table Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d) timeline COMMIT_TIMELINE_TYPE_ACTIVE sequentially in 1 batches
17:11:53.204 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - uploading batch 2 for table Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:11:53.571 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - Reached end of instants in COMMIT_TIMELINE_TYPE_ACTIVE for table Table(absoluteTableUri=s3://warehouse/taxis, databaseName=taxis, lakeName=iceberg, tableId=1fbfcc05-408b-3e43-b95e-af8bb335e97d)
17:11:53.573 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:11:53.574 [main] INFO  com.onehouse.metrics.MetricsServer - Shutting down metrics server
alberttwong commented 4 months ago

When you cilck on the s3 bucket, it goes to incorrect URL

alberttwong commented 4 months ago
albert@Alberts-MBP Downloads % cat hoodie.properties
#Updated at 2024-07-16T17:09:04.443893Z
#Tue Jul 16 17:09:04 UTC 2024
hoodie.table.type=COPY_ON_WRITE
hoodie.table.metadata.partitions=column_stats,files
hoodie.table.partition.fields=vendor_id
hoodie.archivelog.folder=archived
hoodie.timeline.layout.version=1
hoodie.table.checksum=1914023381
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.timeline.timezone=UTC
hoodie.table.recordkey.fields=
hoodie.table.name=s3a\://warehouse/taxis
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.metadata.partitions.inflight=
hoodie.populate.meta.fields=false
hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.table.version=6
alberttwong commented 4 months ago

hudi generated by xtable (source was iceberg table).

export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000/
export AWS_REGION=us-east-1
cd /opt/xtable/jars/; java -jar xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig xtable_iceberg.yaml -p core-site.xml
root@spark:/opt/xtable/jars# cat xtable_iceberg.yaml 
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
sourceFormat: ICEBERG
targetFormats:
  - HUDI
  - DELTA
datasets:
  -
    tableBasePath: s3a://warehouse/taxis
    tableName: taxis
    partitionSpec: vendor_id:VALUE
alberttwong commented 4 months ago

changed hoodie.table.name and deleted and recreated table.

albert@Alberts-MBP Downloads % cat hoodie.properties
#Updated at 2024-07-16T17:09:04.443893Z
#Tue Jul 16 17:09:04 UTC 2024
hoodie.table.type=COPY_ON_WRITE
hoodie.table.metadata.partitions=column_stats,files
hoodie.table.partition.fields=vendor_id
hoodie.archivelog.folder=archived
hoodie.timeline.layout.version=1
hoodie.table.checksum=1914023381
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.timeline.timezone=UTC
hoodie.table.recordkey.fields=
hoodie.table.name=taxis
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.metadata.partitions.inflight=
hoodie.populate.meta.fields=false
hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.table.version=6