radanalyticsio / spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.
Apache License 2.0
157 stars 61 forks source link

Cannot access ceph nano storage from Spark #347

Open tomkos opened 3 years ago

tomkos commented 3 years ago

Description:

I'm trying to access ceph storage, located locally on OpenShift cluster, but I'm using:

spark.hadoop.fs.s3a.path.style.access=true

but when job is run I get:

org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.amazonaws.AmazonClientException: Unable to execute HTTP request: test.ceph-nano-0: Name or service not known);

test - is a bucket name, I try to access it with "LOCATION 's3a://test/import'"

Steps to reproduce:

  1. Create spark cluster with thrift server and CEPH as a storage. 2.Try to execute SparkSQL query and create external table, stored with s3 ceph storage.

Spark cluster details:

spec: customImage: 'quay.io/opendatahub/spark-cluster-image:2.4.3-h2.7' env:

Thrift server details:

apiVersion: v1 kind: Secret metadata: name: thriftserver-server-conf stringData: thrift-server.conf: |- spark.blockManager.port=42100 spark.cores.max=2 spark.driver.bindAddress=0.0.0.0 spark.driver.host=thriftserver.$(namespace).svc spark.driver.memory=2G spark.driver.port=42000 spark.executor.memory=2G spark.hadoop.datanucleus.rdbms.datastoreAdapterClassName=org.datanucleus.store.rdbms.adapter.PostgreSQLAdapter spark.hadoop.datanucleus.schema.autoCreateAll=true spark.hadoop.fs.s3a.endpoint=$(s3_endpoint_url) spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.EnvironmentVariableCredentialsProvider spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3a.path.style.access=true spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver spark.hadoop.javax.jdo.option.ConnectionPassword=$(database_password) spark.hadoop.javax.jdo.option.ConnectionURL=jdbc:postgresql://thriftserver-db.$(namespace).svc:5432/$(database_name) spark.hadoop.javax.jdo.option.ConnectionUserName=$(database_user) spark.sql.adaptive.enabled=true spark.sql.thriftServer.incrementalCollect=true spark.sql.warehouse.dir=/spark-warehouse

Spark Operator 1.1.0 used with Open Data Hub