opensearch-project / opensearch-hadoop

Apache License 2.0
29 stars 23 forks source link

EMR Spark job with SigV4 signing #416

Open joyfulwang opened 6 months ago

joyfulwang commented 6 months ago

Hi, my team is adding SigV4 signing to all of the read/write requests that our resources send to Elasticsearch. We've successfully added signing to requests from our backend Java service and to a Lambda function. We're now trying to add signing to our EMR Spark jobs, which are using emr-6.6.0, Spark 3.x, Scala 2.12, and opensearch-hadoop (Maven-org-opensearch-client_opensearch-spark-30_2_12). The Elasticsearch cluster is version 7.10

After reading the opensearch-hadoop User Guide and the Configuration Options for Maven-org-opensearch-client_opensearch-spark, I updated our OpenSearch config to the following

private val basicOpenSearchConfig = Map(
    "opensearch.nodes" -> <opensearch_endpoint>,
    "opensearch.nodes.wan.only" -> "true",
    "opensearch.port" -> "443",
    "opensearch.net.ssl" -> "true",
    "opensearch.net.ssl.cert.allow.self.signed" -> "true",
    "opensearch.net.ssl.protocol" -> "SSL",
    "opensearch.aws.sigv4.enabled" -> "true",
    "opensearch.aws.sigv4.region" -> "us-east-1")

After enabling the SigV4 signing config, I tested if the Spark job could read index names from a cluster that has fine-grained access control enabled and got "Unauthorized" as the response. Here's what I've tried for troubleshooting:

  1. Copying aws-java-sdk-bundle-1.12.170.jar into the EMR host during bootstrapping, as recommended by the opensearch-hadoop User Guide. It didn't make a difference, and the lack of this jar in the EMR host also didn't cause any ClassDefNotFound errors
  2. Made sure that the following policy is associated with the IAM role that the EMR cluster is using
    {
    "Effect": "Allow",
    "Action": "es:ESHttp*",
    "Resource": "arn:aws:es:us-east-1:<aws_account_id>:domain/<domain_name>/*"
    }

Any ideas for troubleshooting?

Xtansia commented 6 months ago

@joyfulwang Have you mapped the EMR job's IAM role to a internal user within ElasticSearch/OpenSearch according to the documentation here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html#fgac-access-control