onehouseinc / LakeView

Monitoring and insights on your data lakehouse tables
Apache License 2.0
14 stars 3 forks source link

Trying to use AWS env variables but defaults to GCS #78

Open alberttwong opened 1 month ago

alberttwong commented 1 month ago

environment: docker compose with openjdk 11, minio, xtable, spark 3.4, hive 2.3.10, hadoop 2.10.2

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:01:44.376 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
17:01:44.496 [main] INFO  com.onehouse.RuntimeModule - Spinning up 70 threads
17:01:44.657 [main] INFO  com.onehouse.metrics.MetricsServer - Starting metrics server
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Running metadata-extractor one time
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryService - Starting table discover service, excluding []
17:01:44.674 [main] INFO  c.o.m.TableDiscoveryService - Discovering tables in s3://warehouse/people
17:01:44.864 [metadata-extractor-1] ERROR c.o.m.TableDiscoveryService - Failed to discover tables in path: s3://warehouse/people
17:01:44.865 [metadata-extractor-1] ERROR c.o.m.TableDiscoveryService - com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
java.util.concurrent.CompletionException: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
        at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: com.google.cloud.storage.StorageException: Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
        at com.google.cloud.storage.StorageException.translate(StorageException.java:118)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:287)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:430)
        at com.google.cloud.storage.StorageImpl.lambda$listBlobs$11(StorageImpl.java:397)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
        at com.google.cloud.storage.Retrying.run(Retrying.java:54)
        at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:394)
        at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:365)
        at com.onehouse.storage.GCSAsyncStorageClient.lambda$fetchObjectsByPage$1(GCSAsyncStorageClient.java:61)
        ... 7 common frames omitted
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 401 Unauthorized
GET https://storage.googleapis.com/storage/v1/b/warehouse/o?delimiter=/&prefix=people/&projection=full
{
  "code" : 401,
  "errors" : [ {
    "domain" : "global",
    "location" : "Authorization",
    "locationType" : "header",
    "message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).",
    "reason" : "required"
  } ],
  "message" : "Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)."
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:525)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.list(HttpStorageRpc.java:420)
        ... 15 common frames omitted
17:01:44.866 [metadata-extractor-1] INFO  c.o.m.TableMetadataUploaderService - Uploading metadata of following tables: []
17:01:44.867 [main] INFO  c.o.m.TableDiscoveryAndUploadJob - Run Completed
17:01:44.867 [main] INFO  com.onehouse.metrics.MetricsServer - Shutting down metrics server
root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
#        region: us-west-2
#        accessKey: admin
#        accessSecret: password

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed
export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000
export AWS_REGION=us-east-1
alberttwong commented 1 month ago

If you change to not use OS variables and define them in YAML.

root@spark:/opt/LakeView# cat delta.yaml 
version: V1

onehouseClientConfig:
    # can be obtained from the Onehouse console
    projectId: c3eb3868-6979-41cd-9018-952d29a43337
    apiKey: asU2Pb3XaNAc4JwkkWpNUQ== 
    apiSecret: IBaLVxloIzU36heBooOBsPp5MhD6ijjyIk88zvH2ggs=
    userId: x2gblCN8xNSurvCsqDaGJ84zy913 

fileSystemConfiguration:
    # Provide either s3Config or gcsConfig
    s3Config:
        region: us-east-1
        accessKey: admin
        accessSecret: password
        endpoint: http://minio:9000

metadataExtractorConfig:
    jobRunMode: ONCE
    pathExclusionPatterns: 
    parserConfig:
        - lake: <lake1>
          databases:
            - name: people
              basePaths: ["s3://warehouse/people"]
        # Add additional lakes and databases as needed

then I get this error

root@spark:/opt/LakeView# java -jar LakeView-release-v0.10.0-all.jar -p '/opt/LakeView/delta.yaml' 
17:05:25.080 [main] INFO  com.onehouse.Main - Starting LakeView extractor service
Exception in thread "main" java.lang.RuntimeException: Failed to load config
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:31)
        at com.onehouse.Main.loadConfig(Main.java:92)
        at com.onehouse.Main.start(Main.java:56)
        at com.onehouse.Main.main(Main.java:41)
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "endpoint" (class com.onehouse.config.models.common.S3Config$S3ConfigBuilder), not marked as ignorable (3 known properties: "accessKey", "region", "accessSecret"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: com.onehouse.config.models.configv1.ConfigV1$ConfigV1Builder["fileSystemConfiguration"]->com.onehouse.config.models.common.FileSystemConfiguration$FileSystemConfigurationBuilder["s3Config"]->com.onehouse.config.models.common.S3Config$S3ConfigBuilder["endpoint"])
        at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
        at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
        at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2023)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:298)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeSetAndReturn(MethodProperty.java:158)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.vanillaDeserialize(BuilderBasedDeserializer.java:293)
        at com.fasterxml.jackson.databind.deser.BuilderBasedDeserializer.deserialize(BuilderBasedDeserializer.java:217)
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
        at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
        at com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
        at com.onehouse.config.ConfigLoader.loadConfigFromJsonNode(ConfigLoader.java:47)
        at com.onehouse.config.ConfigLoader.loadConfigFromConfigFile(ConfigLoader.java:29)
        ... 3 more