Open alberttwong opened 4 months ago
PR submitted. https://github.com/onehouseinc/LakeView/pull/85
Using the new PR
root@spark:/opt/LakeView# java -jar LakeView-1.0-SNAPSHOT-all.jar -p '/opt/LakeView/delta.yaml'
17:53:05.956 [main] INFO com.onehouse.Main - Starting LakeView extractor service
17:53:06.083 [main] INFO com.onehouse.RuntimeModule - Spinning up 70 threads
17:53:06.373 [main] INFO com.onehouse.metrics.MetricsServer - Starting metrics server
17:53:06.386 [main] INFO c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:53:06.386 [main] INFO c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:53:06.387 [main] INFO c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:53:06.555 [metadata-extractor-2] INFO c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=null)]
17:53:06.557 [metadata-extractor-1] INFO c.o.m.TableMetadataUploaderService - Fetching checkpoint for tables: [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)]
17:53:06.943 [metadata-extractor-1] INFO c.o.m.TableMetadataUploaderService - Initializing following tables [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)]
17:53:07.218 [metadata-extractor-2] INFO c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:53:07.231 [metadata-extractor-1] INFO c.o.m.TimelineCommitInstantsUploader - Processing 1 instants in table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline COMMIT_TIMELINE_TYPE_ARCHIVED sequentially in 1 batches
17:53:07.231 [metadata-extractor-1] INFO c.o.m.TimelineCommitInstantsUploader - uploading batch 1 for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:53:07.618 [metadata-extractor-1] INFO c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:53:07.641 [metadata-extractor-1] INFO c.o.m.TimelineCommitInstantsUploader - Processing 3 instants in table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline COMMIT_TIMELINE_TYPE_ACTIVE sequentially in 1 batches
17:53:07.641 [metadata-extractor-3] INFO c.o.m.TimelineCommitInstantsUploader - uploading batch 2 for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:53:07.993 [metadata-extractor-1] INFO c.o.m.TimelineCommitInstantsUploader - Reached end of instants in COMMIT_TIMELINE_TYPE_ACTIVE for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)
17:53:07.995 [main] INFO c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:53:07.996 [main] INFO com.onehouse.metrics.MetricsServer - Shutting down metrics server
root@spark:/opt/LakeView# ls
delta.yaml LakeView-1.0-SNAPSHOT-all.jar
root@spark:/opt/LakeView# cat delta.yaml
version: V1
onehouseClientConfig:
# can be obtained from the Onehouse console
projectId: c3eb3868-6979-41cd-9018-952d29a43337
apiKey: XXXX==
apiSecret: YYYYYY=
userId: x2gblCN8xNSurvCsqDaGJ84zy913
fileSystemConfiguration:
# Provide either s3Config or gcsConfig
s3Config:
region: us-east-1
accessKey: admin
accessSecret: password
endpoint: http://minio:9000
forcePathStyle: true
metadataExtractorConfig:
jobRunMode: ONCE
pathExclusionPatterns:
parserConfig:
- lake: <lake1>
databases:
- name: people
basePaths: ["s3://warehouse/people"]
# Add additional lakes and databases as needed
environment: docker compose with openjdk 11, minio, xtable, spark 3.4, hive 2.3.10, hadoop 2.10.2
then I get this error
Originally posted by @alberttwong in https://github.com/onehouseinc/LakeView/issues/78#issuecomment-2228991357