snowplow / stream-collector

Collector for cloud-native web, mobile and event analytics, running on AWS and GCP
http://snowplowanalytics.com
Other
27 stars 32 forks source link

snowplow-collector-scala does not use IAM Role for Service accounts in a container #186

Open brettcave opened 3 years ago

brettcave commented 3 years ago

When using snowplow stream collector scala (version 2.4.1) in kubernetes in AWS, the authentication does not work as expected.

I have tried the following steps to get it working:

  1. Configure IRSA - https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html - OIDC provider is set up, IAM role + policy is created, create a ServiceAccount.
  2. Test that IRSA is working - I have swapped out the snowplow/scala-stream-collector-kinesis for my own container image that has AWS CLI installed, and am able to validate that I am assuming the role correctly (aws sts get-caller-identity)
  3. Set up kinesis and add a configmap definition.
  4. Deploy the collector

After deploying the collector, I see the following errors:

com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts::REDACTED:assumed-role/eks-node-group-role/i-INSTANCEID is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:<region>:<account_id>:stream/<good_stream> because no identity-based policy allows the kinesis:DescribeStream action (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException;)

So I can see that the role being assumed by the service is the IAM Instance Profile of the underlying node (which is restricted), and not the IAM Role for Service account.

I have variations on the aws snippet in the configmap:

              accessKey = default
              secretKey = default
          }

and

          aws {
              accessKey = iam
              secretKey = iam
          }

However, based on https://github.com/snowplow/stream-collector/blob/master/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala#L432, I believe the first is the 1 that should be used, as it would trigger the DefaultAWScredentialProviderChain(), and with the SDK being 1 that supports IRSA, it should pick up the Web Identity token as a higher priority (3) than EC2 instance profile (6). https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

But for some reason it is not, not sure why.

I have also tried adding the following to the pod security context, as I have seen issues with accessing tokens before:

 fsGroup: 1000

However, I don't think this is related, because the container I used for testing was able to access the token and assume the IRSA role by default, and not the IAM instance profile of the node.

edit

from the scala collector container:

$ printenv | grep AWS
...
AWS_ROLE_ARN=arn:aws:iam:::role/my-sp-collector-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
$ whoami
daemon
$ ls -lh /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token  # where the token above softlinks to
-rw-r----- 1 root daemon 1.1K Nov 10 03:44 /run/secrets/eks.amazonaws.com/serviceaccount/..data/token
$ cat /run/secrets/eks.amazonaws.com/serviceaccount/token 
<valid token is shown>

I have tried some variations on the security contexts, e.g. to map the token group

podSecurityContext:
  fsGroup: 1
securityContext:
  runAsGroup: 1
  runAsUser: 1

The only thing i can think of possibly is that the volume mount for the token / secret completes after the service starts, not sure if this is possible or the best way to validate.

caleb15 commented 2 years ago

Same issue with the snowplow/scala-stream-collector-kinesis:2.5.0 docker image. I'm getting the issue regardless of whether I specify iam or default. Maybe snowplow is just using an old version of the AWS SDK? It should be fixed in version 2.10.11 of java SDK v2 but I'm not sure which version they are using.

The only thing i can think of possibly is that the volume mount for the token / secret completes after the service starts

I don't think that's the cause. I tested by deploying the container with a overriden entrypoint:

          command: ["sleep"]
          args: ['3600']

I exec'd into the pod, made sure the env vars were set properly, and manually executed the command, yet I got the same issue.

caleb15 commented 2 years ago

Did some more investigation and found out that this issue should have been fixed already by https://github.com/snowplow/stream-collector/pull/170. I double-checked and confirmed that the sts jar file is in the docker image and it's specified in the classpath too. I also confirmed it has the necessary SDK version (1.12.128). It's really weird that this issue is still happening, @istreeter @mmathias01 any ideas?

Logs:
``` daemon@snowplow-7978d84d7b-gjn7m:/opt/snowplow$ ls lib com.amazonaws.aws-java-sdk-core-1.12.128.jar com.snowplowanalytics.snowplow-badrows_2.12-2.1.1.jar javax.annotation.javax.annotation-api-1.3.2.jar com.amazonaws.aws-java-sdk-kinesis-1.12.128.jar com.snowplowanalytics.snowplow-scala-analytics-sdk_2.12-2.1.0.jar joda-time.joda-time-2.10.13.jar com.amazonaws.aws-java-sdk-sqs-1.12.128.jar com.snowplowanalytics.snowplow-scala-tracker-core_2.12-1.0.0.jar org.apache.httpcomponents.httpclient-4.5.13.jar com.amazonaws.aws-java-sdk-sts-1.12.128.jar com.snowplowanalytics.snowplow-scala-tracker-emitter-id_2.12-1.0.0.jar org.apache.httpcomponents.httpcore-4.4.13.jar com.amazonaws.jmespath-java-1.12.128.jar com.snowplowanalytics.snowplow-stream-collector-core-2.5.0.jar org.apache.thrift.libthrift-0.15.0.jar com.chuusai.shapeless_2.12-2.3.7.jar com.snowplowanalytics.snowplow-stream-collector-kinesis-2.5.0.jar org.reactivestreams.reactive-streams-1.0.3.jar com.fasterxml.jackson.core.jackson-annotations-2.12.3.jar com.typesafe.akka.akka-actor_2.12-2.6.16.jar org.scalaj.scalaj-http_2.12-2.4.2.jar com.fasterxml.jackson.core.jackson-core-2.12.3.jar com.typesafe.akka.akka-http_2.12-10.2.7.jar org.scala-lang.modules.scala-java8-compat_2.12-0.8.0.jar com.fasterxml.jackson.core.jackson-databind-2.12.3.jar com.typesafe.akka.akka-http-core_2.12-10.2.7.jar org.scala-lang.modules.scala-parser-combinators_2.12-1.1.2.jar com.fasterxml.jackson.dataformat.jackson-dataformat-cbor-2.12.3.jar com.typesafe.akka.akka-parsing_2.12-10.2.7.jar org.scala-lang.scala-library-2.12.10.jar com.github.pureconfig.pureconfig_2.12-0.15.0.jar com.typesafe.akka.akka-slf4j_2.12-2.6.16.jar org.slf4j.log4j-over-slf4j-1.7.32.jar com.github.pureconfig.pureconfig-core_2.12-0.15.0.jar com.typesafe.akka.akka-stream_2.12-2.6.16.jar org.slf4j.slf4j-api-1.7.32.jar com.github.pureconfig.pureconfig-generic_2.12-0.15.0.jar com.typesafe.config-1.4.1.jar org.slf4j.slf4j-simple-1.7.32.jar com.github.pureconfig.pureconfig-generic-base_2.12-0.15.0.jar com.typesafe.ssl-config-core_2.12-0.4.2.jar org.typelevel.cats-core_2.12-2.6.1.jar com.github.scopt.scopt_2.12-4.0.1.jar io.circe.circe-core_2.12-0.14.1.jar org.typelevel.cats-effect_2.12-2.2.0.jar commons-codec.commons-codec-1.15.jar io.circe.circe-generic_2.12-0.14.1.jar org.typelevel.cats-kernel_2.12-2.6.1.jar commons-logging.commons-logging-1.2.jar io.circe.circe-jawn_2.12-0.13.0.jar org.typelevel.jawn-parser_2.12-1.0.0.jar com.snowplowanalytics.collector-payload-1-0.0.0.jar io.circe.circe-numbers_2.12-0.14.1.jar org.typelevel.simulacrum-scalafix-annotations_2.12-0.5.4.jar com.snowplowanalytics.iglu-core_2.12-1.0.0.jar io.circe.circe-parser_2.12-0.13.0.jar software.amazon.ion.ion-java-1.0.2.jar com.snowplowanalytics.iglu-core-circe_2.12-1.0.0.jar io.prometheus.simpleclient-0.9.0.jar com.snowplowanalytics.iglu-scala-client_2.12-1.1.1.jar io.prometheus.simpleclient_common-0.9.0.jar daemon@snowplow-7978d84d7b-gjn7m:/opt/snowplow$ bin/snowplow-stream-collector --config /etc/conf/collector-conf -v # Executing command line: /opt/java/openjdk/bin/java -cp /opt/snowplow/lib/com.snowplowanalytics.snowplow-stream-collector-kinesis-2.5.0.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-stream-collector-core-2.5.0.jar:/opt/snowplow/lib/org.scala-lang.scala-library-2.12.10.jar:/opt/snowplow/lib/org.apache.thrift.libthrift-0.15.0.jar:/opt/snowplow/lib/joda-time.joda-time-2.10.13.jar:/opt/snowplow/lib/org.slf4j.slf4j-simple-1.7.32.jar:/opt/snowplow/lib/org.slf4j.log4j-over-slf4j-1.7.32.jar:/opt/snowplow/lib/com.typesafe.config-1.4.1.jar:/opt/snowplow/lib/io.prometheus.simpleclient-0.9.0.jar:/opt/snowplow/lib/io.prometheus.simpleclient_common-0.9.0.jar:/opt/snowplow/lib/com.github.scopt.scopt_2.12-4.0.1.jar:/opt/snowplow/lib/com.typesafe.akka.akka-stream_2.12-2.6.16.jar:/opt/snowplow/lib/com.typesafe.akka.akka-http_2.12-10.2.7.jar:/opt/snowplow/lib/com.typesafe.akka.akka-slf4j_2.12-2.6.16.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-badrows_2.12-2.1.1.jar:/opt/snowplow/lib/com.snowplowanalytics.collector-payload-1-0.0.0.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig_2.12-0.15.0.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-scala-tracker-core_2.12-1.0.0.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-scala-tracker-emitter-id_2.12-1.0.0.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-kinesis-1.12.128.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-sts-1.12.128.jar:/opt/snowplow/lib/com.fasterxml.jackson.dataformat.jackson-dataformat-cbor-2.12.3.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-sqs-1.12.128.jar:/opt/snowplow/lib/org.slf4j.slf4j-api-1.7.32.jar:/opt/snowplow/lib/org.apache.httpcomponents.httpclient-4.5.13.jar:/opt/snowplow/lib/org.apache.httpcomponents.httpcore-4.4.13.jar:/opt/snowplow/lib/javax.annotation.javax.annotation-api-1.3.2.jar:/opt/snowplow/lib/com.typesafe.akka.akka-actor_2.12-2.6.16.jar:/opt/snowplow/lib/org.reactivestreams.reactive-streams-1.0.3.jar:/opt/snowplow/lib/com.typesafe.ssl-config-core_2.12-0.4.2.jar:/opt/snowplow/lib/com.typesafe.akka.akka-http-core_2.12-10.2.7.jar:/opt/snowplow/lib/org.typelevel.cats-core_2.12-2.6.1.jar:/opt/snowplow/lib/io.circe.circe-generic_2.12-0.14.1.jar:/opt/snowplow/lib/com.snowplowanalytics.iglu-scala-client_2.12-1.1.1.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-scala-analytics-sdk_2.12-2.1.0.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig-core_2.12-0.15.0.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig-generic_2.12-0.15.0.jar:/opt/snowplow/lib/com.snowplowanalytics.iglu-core_2.12-1.0.0.jar:/opt/snowplow/lib/io.circe.circe-parser_2.12-0.13.0.jar:/opt/snowplow/lib/com.snowplowanalytics.iglu-core-circe_2.12-1.0.0.jar:/opt/snowplow/lib/org.typelevel.cats-effect_2.12-2.2.0.jar:/opt/snowplow/lib/org.scalaj.scalaj-http_2.12-2.4.2.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-core-1.12.128.jar:/opt/snowplow/lib/com.amazonaws.jmespath-java-1.12.128.jar:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-databind-2.12.3.jar:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-core-2.12.3.jar:/opt/snowplow/lib/commons-logging.commons-logging-1.2.jar:/opt/snowplow/lib/commons-codec.commons-codec-1.15.jar:/opt/snowplow/lib/org.scala-lang.modules.scala-java8-compat_2.12-0.8.0.jar:/opt/snowplow/lib/org.scala-lang.modules.scala-parser-combinators_2.12-1.1.2.jar:/opt/snowplow/lib/com.typesafe.akka.akka-parsing_2.12-10.2.7.jar:/opt/snowplow/lib/org.typelevel.cats-kernel_2.12-2.6.1.jar:/opt/snowplow/lib/org.typelevel.simulacrum-scalafix-annotations_2.12-0.5.4.jar:/opt/snowplow/lib/io.circe.circe-core_2.12-0.14.1.jar:/opt/snowplow/lib/com.chuusai.shapeless_2.12-2.3.7.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig-generic-base_2.12-0.15.0.jar:/opt/snowplow/lib/io.circe.circe-jawn_2.12-0.13.0.jar:/opt/snowplow/lib/software.amazon.ion.ion-java-1.0.2.jar:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-annotations-2.12.3.jar:/opt/snowplow/lib/io.circe.circe-numbers_2.12-0.14.1.jar:/opt/snowplow/lib/org.typelevel.jawn-parser_2.12-1.0.0.jar com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector --config /etc/conf/collector-conf [main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Creating thread pool of size 10 WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.fasterxml.jackson.databind.util.ClassUtil (file:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-databind-2.12.3.jar) to field java.lang.Throwable.cause WARNING: Please consider reporting this to the maintainers of com.fasterxml.jackson.databind.util.ClassUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release [main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - Error checking if stream good-snow exists com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts:::assumed-role/15FiveEKSNode/i- is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:us-east-1::stream/good-snow because no identity-based policy allows the kinesis:DescribeStream action (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException; Request ID: eeed99b9-7575-6dba-b43d-92ac6118feef; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539) at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2980) at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2947) at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2936) at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:898) at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:867) at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:910) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$streamExists$1(KinesisSink.scala:525) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.streamExists(KinesisSink.scala:524) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.runChecks(KinesisSink.scala:486) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$3(KinesisSink.scala:399) at scala.util.Either.map(Either.scala:353) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$2(KinesisSink.scala:398) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$1(KinesisSink.scala:397) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:396) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$2(KinesisCollector.scala:49) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.main(KinesisCollector.scala:33) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector.main(KinesisCollector.scala) [main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - SQS buffer is not configured. [main] WARN com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - No SQS buffer for surge protection set up (consider setting a SQS Buffer in config.hocon). [main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - Error checking if stream bad-snow exists com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts:::assumed-role/15FiveEKSNode/i- is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:us-east-1::stream/bad-snow because no identity-based policy allows the kinesis:DescribeStream action (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException; Request ID: efe39e37-2344-50e2-b533-952217d662b2; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539) at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2980) at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2947) at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2936) at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:898) at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:867) at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:910) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$streamExists$1(KinesisSink.scala:525) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.streamExists(KinesisSink.scala:524) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.runChecks(KinesisSink.scala:486) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$3(KinesisSink.scala:399) at scala.util.Either.map(Either.scala:353) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$2(KinesisSink.scala:398) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$1(KinesisSink.scala:397) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:396) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$3(KinesisCollector.scala:51) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$2(KinesisCollector.scala:43) at scala.util.Either.flatMap(Either.scala:341) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.main(KinesisCollector.scala:33) at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector.main(KinesisCollector.scala) [main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - SQS buffer is not configured. [main] WARN com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - No SQS buffer for surge protection set up (consider setting a SQS Buffer in config.hocon). [scala-stream-collector-akka.actor.default-dispatcher-5] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started [main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService - Telemetry enabled [scala-stream-collector-akka.actor.default-dispatcher-5] INFO com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - REST interface bound to /0.0.0.0:8000 ^C[Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Received shutdown signal [Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Sleeping for 10 seconds [Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Initiating http server termination [Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Server terminated [scala-stream-collector-akka.actor.default-dispatcher-13] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Initiating bad sink shutdown [scala-stream-collector-akka.actor.default-dispatcher-12] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Initiating good sink shutdown [scala-stream-collector-akka.actor.default-dispatcher-12] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Completed good sink shutdown [scala-stream-collector-akka.actor.default-dispatcher-13] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Completed bad sink shutdown ```
jbeemster commented 2 years ago

Hi @caleb15 I have been testing this out today and with:

aws {
        accessKey = default
        secretKey = default
      }

... and an OIDC IAM role ARN attached to the pod service-account I could successfully write out to the target stream. If you use the iam inputs it indeed does not work but the DefaultAWSCredentialsProviderChain with v2.5.0 does work as far as I can tell.

Are you certain that the ServiceAccount you have attached to the service is correctly configured and attached?

caleb15 commented 2 years ago

Weird, this time default works. I could've sworn I tested it before and default didn't work :|

Sorry about that, thanks for testing!

caleb15 commented 2 years ago

Just ran into this issue again when recreating the pod even though I have it set to default. Same issue with snowplow kinesis enrichment. I'll try making my own pod with sudo rights and awscli so I can look further into it.

caleb15 commented 2 years ago

Nevermind, turns out I got the error because the trust relationship in the IAM role referenced serviceaccount's old name ~ I had updated the serviceaccount to have a new name, but didn't realize I would also need to update the IAM role too because I thought all you needed was the ARN reference. Turns out you need to make sure the ARN annotation in the service account matches and that the service account name matches the name in the trust relationship.

Relevant: https://stackoverflow.com/questions/66405794/not-authorized-to-perform-stsassumerolewithwebidentity-403

Some more debugging tips: When your IAM role is working you should be able to do aws sts get-caller-identity and get something like the following:

{
    "UserId": "<censored>:botocore-session-<censored>",
    "Account": "<censored>",
    "Arn": "arn:aws:sts::<censored>:assumed-role/<role-name>/botocore-session-<censored>"
}

You should also make sure the serviceaccount name in the pod matches the name of the serviceaccount:

kf get pod/<podname> -o yaml | grep serviceAccount

Note the container doesn't have root privileges so you can't install awscli. I made my own container with root privileges and used that instead. However, I realized there's a far easier way: just set runAsUser in the securityContext to 0 (root). That way you can install whatever packages you need to debug.

kalupa commented 1 year ago

Is this still an active issue? I was looking for info about k8s and the scala collector and stumbled upon this, and it's unclear why the issue is still 'open'