roadrunner-server / roadrunner

🤯 High-performance PHP application server, process manager written in Go and powered with plugins
https://docs.roadrunner.dev
MIT License
7.86k stars 408 forks source link

[🐛 BUG]: Region is missing from SQS endpoint URL when RR is running inside EC2 #1833

Closed matteokov closed 9 months ago

matteokov commented 9 months ago

No duplicates 🥲.

What happened?

After I moved my application to EC2 I started getting an error when GetQueueUrl is performed: sqs..amazonaws.com: no such host.

It looks like the region part is missing from the endpoint. If I try to override an endpoint with sqs config key, this is ignored and the same error occurs.

sqs:
  region: eu-central-1
  endpoint: https://sqs.eu-central-1.amazonaws.com

If I understood correctly, Inside the EC2 instance, RoadRunner should look for identity by sending a request to http://169.254.169.254/latest/dynamic/instance-identity/. Later, AWS Go SDK should fetch credentials from EC2 and use them to perform requests to SQS.

To debug, I sent a request to http://169.254.169.254/latest/dynamic/instance-identity/document from inside the instance and I can see that my region is eu-central-1.

I'm using Symfony and baldinof/roadrunner-bundle but this shouldn't be relevant to reproduce an issue as it fails before even reaching PHP/Symfony.

Version (rr --version)

Tested on both 2023.3.6 and 2023.3.8

How to reproduce the issue?

RoadRunner config:

server:
  command: "php public/index.php"
  env:
    - APP_RUNTIME: Baldinof\RoadRunnerBundle\Runtime\Runtime

http:
  address: 0.0.0.0:8080
  middleware: [ "static", "gzip" ]
  pool:
    debug: true
  uploads:
    forbid: [ ".php", ".exe", ".bat" ]
  static:
    dir: "public"
    forbid: [ ".php", ".htaccess" ]

logs:
  mode: development
  channels:
    http:
      level: debug
    server:
      level: info
      mode: raw
    metrics:
      level: debug

jobs:
  num_pollers: 2
  timeout: 60
  pipeline_size: 100000

  pool:
    debug: true

  consume: [ "events-sqs-pipeline" ]

  pipelines:
    events-sqs-pipeline:
      driver: sqs
      config:
        skip_queue_declaration: true
        prefetch: 10
        consume_all: true
        visibility_timeout: 30
        wait_time_seconds: 20
        queue: test_queue

Run the application in EC2 instance with PHP version 8.2

Relevant log output

{"level":"DEBUG","ts":"2024-01-02T21:06:37+0000","logger":"jobs        ","msg":"initializing driver","pipeline":"events-sqs-pipeline","driver":"sqs"}
{"level":"ERROR","ts":"2024-01-02T21:06:37+0000","logger":"jobs        ","msg":"failed to initialize driver","pipeline":"events-sqs-pipeline","driver":"sqs","error":"new_sqs_consumer: operation error SQS: GetQueueUrl, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs..amazonaws.com/\": dial tcp: lookup sqs..amazonaws.com: no such host"}
rustatian commented 9 months ago

Hey @matteokov 👋 Yeah, RR tries to perform a call to the well-known local URL to check if it is running inside AWS environment. The global sqs key ignored in that case (which might be a wrong decision btw). I'll double-check that behavior because configuration from inside AWS is managed by the AWS Golang package...

matteokov commented 9 months ago

@rustatian thanks for a fast response :)

Regarding the global sqs key, I think that a better flow would be to always respect the explicitly provided configuration

So priority would be:

  1. If there is sqs key and there are credentials/configs provided, use them
  2. Use credentials from the EC2 instance (current flow)

Let me know if you need any additional info regarding the endpoint region issue.

rustatian commented 9 months ago

Yeah, I'm currently verifying your info, but anyway, in 2024.1 I'll update this behavior (since this is a BC).

rustatian commented 9 months ago

@matteokov Could you please try to set AWS_REGION env variable and restart RR?

matteokov commented 9 months ago

@rustatian This fixed an issue and RoadRunner started normally

rustatian commented 9 months ago

It looks like a bug in the AWS GO SDK, I found an issue stating about the same problem.

rustatian commented 9 months ago

But, I found a way how to support both, the global sqs config with the data provided by the IAM.

rustatian commented 9 months ago

Ok, since I can't control the changed behavior in the AWS SDK, the global configuration would be able to override the existing IAM values since the next bugfix version (2023.3.9)

rustatian commented 8 months ago

The fix will be released next Thursday.