onehouseinc / LakeView

Monitoring and insights on your data lakehouse tables
Apache License 2.0
22 stars 7 forks source link

Need support for AWS Endpoint and forcePathStyle to support MINIO and/or local development #79

Open alberttwong opened 4 months ago

alberttwong commented 4 months ago

environment: docker compose with openjdk 11, minio, xtable, spark 3.4, hive 2.3.10, hadoop 2.10.2

          If you change to not use OS variables and define them in YAML.
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed

then I get this error

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:05:25.080 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
Exception in thread "main" java.lang.RuntimeException: Failed to load config
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:31)
        at com.onehouse.Main.loadConfig(Main.java:92)
        at com.onehouse.Main.start(Main.java:56)
        at com.onehouse.Main.main(Main.java:41)
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "endpoint" (class com.onehouse.config.models.common.S3Config$S3ConfigBuilder), not marked as ignorable (3 known properties: "accessKey", "region", "accessSecret"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: com.onehouse.config.models.configv1.ConfigV1$ConfigV1Builder["fileSystemConfiguration"]->com.onehouse.config.models.common.FileSystemConfiguration$FileSystemConfigurationBuilder["s3Config"]->com.onehouse.config.models.common.S3Config$S3ConfigBuilder["endpoint"])
        at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
        at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:298)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
        at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
        at com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
        at com.onehouse.config.ConfigLoader.loadConfigFromJsonNode(ConfigLoader.java:47)
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:29)
        ... 3 more
export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000
export AWS_REGION=us-east-1

Originally posted by @alberttwong in https://github.com/onehouseinc/LakeView/issues/78#issuecomment-2228991357

alberttwong commented 4 months ago

PR submitted. https://github.com/onehouseinc/LakeView/pull/85

alberttwong commented 4 months ago

Using the new PR

root@spark:/opt/LakeView# java -jar LakeView-1.0-SNAPSHOT-all.jar -p '/opt/LakeView/delta.yaml' 
17:53:05.956 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
17:53:06.083 [main] INFO  com.onehouse.RuntimeModule - Spinning up 70 threads
17:53:06.373 [main] INFO  com.onehouse.metrics.MetricsServer - Starting metrics server
17:53:06.386 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:53:06.386 [main] INFO  c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:53:06.387 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:53:06.555 [metadata-extractor-2] INFO  c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=null)]
17:53:06.557 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Fetching checkpoint for tables: [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)]
17:53:06.943 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Initializing following tables [Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)]
17:53:07.218 [metadata-extractor-2] INFO  c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:53:07.231 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - Processing 1 instants in table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline COMMIT_TIMELINE_TYPE_ARCHIVED sequentially in 1 batches
17:53:07.231 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - uploading batch 1 for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ARCHIVED
17:53:07.618 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - uploading instants in table: Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:53:07.641 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - Processing 3 instants in table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline COMMIT_TIMELINE_TYPE_ACTIVE sequentially in 1 batches
17:53:07.641 [metadata-extractor-3] INFO  c.o.m.TimelineCommitInstantsUploader - uploading batch 2 for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273) timeline: COMMIT_TIMELINE_TYPE_ACTIVE
17:53:07.993 [metadata-extractor-1] INFO  c.o.m.TimelineCommitInstantsUploader - Reached end of instants in COMMIT_TIMELINE_TYPE_ACTIVE for table Table(absoluteTableUri=s3://warehouse/people, databaseName=people, lakeName=<lake1>, tableId=a49186aa-9b1a-30df-ab01-4a1af50f1273)
17:53:07.995 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:53:07.996 [main] INFO  com.onehouse.metrics.MetricsServer - Shutting down metrics server
root@spark:/opt/LakeView# ls
delta.yaml  LakeView-1.0-SNAPSHOT-all.jar
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: XXXX== 
    apiSecret: YYYYYY=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000
        forcePathStyle: true

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed