terralist / terralist

Terraform Private Registry for modules and providers manageable from a REST API
http://www.terralist.io/
Mozilla Public License 2.0
311 stars 27 forks source link

HTTP Error 409 when POSTING a new version of a module #309

Open jsidney opened 1 month ago

jsidney commented 1 month ago

Hi guys - I migrated our terralist setup to K8s yesterday and I am suddenly seeing some weird errors when trying to post new versions of a module to the registry. It seems as though the authentication is correct and all of that seems to be working, but the server is responding with a 409 error. We are running all of our services behind an Istio service mesh with STRICT mtls enabled (i honestly dont know if that could be the issue?) . i was wondering if anyone had seen this error before and how we could possibly solve it?

terralist version: 0.5.1 backend: s3 db: postgresql

the command i am running is:

curl POST [https://terralist.tooling.synthesize.co.za/v1/api/modules/${TERRAFORM_MODULE_NAME}/${TERRAFORM_MODULE_SYSTEM}/${TERRAFORM_MODULE_VERSION}/upload](https://terralist.tooling.synthesize.co.za/v1/api/modules/$%7BTERRAFORM_MODULE_NAME%7D/$%7BTERRAFORM_MODULE_SYSTEM%7D/$%7BTERRAFORM_MODULE_VERSION%7D/upload) \
    -H "Authorization: Bearer x-api-key:${TERRALIST_TOKEN}" \
    -d "{ \"download_url\": \"${location}\" }" --fail --verbose

here is the error:


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Could not resolve host: POST
* Closing connection 0
curl: (6) Could not resolve host: POST
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 34.251.170.239:443...
* Connected to [terralist.tooling.synthesize.co.za](http://terralist.tooling.synthesize.co.za/) (34.251.170.239) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [15 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2611 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=terralist.tooling.synthesize.co.za
*  start date: Aug  5 08:08:57 2024 GMT
*  expire date: Nov  3 08:08:56 2024 GMT
*  subjectAltName: host "[terralist.tooling.synthesize.co.za](http://terralist.tooling.synthesize.co.za/)" matched cert's "[terralist.tooling.synthesize.co.za](http://terralist.tooling.synthesize.co.za/)"
*  issuer: C=US; O=Let's Encrypt; CN=R11
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x5605869cc860)
} [5 bytes data]
> POST /v1/api/modules/account-baseline/aws/1.15.0/upload HTTP/2
> Host: terralist.tooling.synthesize.co.za
> user-agent: curl/7.74.0
> accept: */*
> authorization: Bearer x-api-key:$TERRALIST_TOKEN
> content-length: 128
> content-type: application/x-www-form-urlencoded
> 
} [5 bytes data]
* We are completely uploaded and fine
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [230 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [230 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647)!
} [5 bytes data]
* The requested URL returned error: 409 
* stopped the pause stream!
100   128    0     0  100   128      0    280 --:--:-- --:--:-- --:--:--   280
* Connection #1 to host [terralist.tooling.synthesize.co.za](http://terralist.tooling.synthesize.co.za/) left intact
curl: (22) The requested URL returned error: 409

here is the log from terralist and istio: Terralist:

{"level":"warn","method":"POST","path":"/v1/api/modules/account-baseline/aws/1.15.0/upload","resp_time":156.525153,"status":409,"client_ip":"20.0.3.22","time":"2024-08-06T13:27:56Z"}

istio:

[2024-08-06T13:27:56.661Z] "POST /v1/api/modules/account-baseline/aws/1.15.0/upload HTTP/1.1" 409 - via_upstream - "-" 128 209 157 157 "20.0.3.22" "curl/7.74.0" "c9287306-983e-49b8-9931-bd89274f90e7" "terralist.tooling.synthesize.co.za" "20.0.5.187:5758" inbound|5758|| 127.0.0.6:55543 20.0.5.187:5758 20.0.3.22:0 outbound_.5758_._.terralist-service.terralist.svc.cluster.local default
[2024-08-06T13:27:56.761Z] "- - -" 0 - - - "-" 2367 6752 6082 - "-" "-" "-" "-" "3.5.69.16:443" PassthroughCluster 20.0.5.187:48602 3.5.69.16:443 20.0.5.187:48586 - -
[2024-08-06T13:38:06.434Z] "POST /v1/api/modules/account-baseline/aws/1.15.0/upload HTTP/1.1" 409 - via_upstream - "-" 94 68 30 29 "20.0.3.22" "curl/7.68.0" "e18efe09-2653-400e-bb19-fef963f4f35b" "terralist.tooling.synthesize.co.za" "20.0.5.187:5758" inbound|5758|| 127.0.0.6:55543 20.0.5.187:5758 20.0.3.22:0 outbound_.5758_._.terralist-service.terralist.svc.cluster.local default
valentindeaconu commented 3 weeks ago

Hello, the 409 status code represents an error inside the business logic. It can result from multiple causes, it would be very helpful if you could post here the body of the message. The body contains the exact error.

jsidney commented 3 weeks ago

@valentindeaconu - i do not have easy access to the body of the message (but i will try and spin up a test environment inside a k8s cluster to replicate the situation). At the time, I do remember seeing some funny "AccessDenied" messages in AWS Cloudtrail (i am using S3 as the storage backend). We are using Service Accounts and OIDC roles for all of the tooling inside our cluster so I wonder if there is something to do with how the application is assuming the role?

jsidney commented 3 weeks ago

Curently, we are running terralist inside a ECS/Fargate cluster and we are simply using the same IAM policy for both the ECS task's role as well as the K8s IRSA role - so i am pretty confident that it is not specifically the IAM permissions that are incorrect,

valentindeaconu commented 10 hours ago

We are using Service Accounts and OIDC roles for all of the tooling inside our cluster so I wonder if there is something to do with how the application is assuming the role?

Terralist is not assuming any role. You can either configure it with a set of static credentials, or configure it to use the default credentials providers chain (which I guess is your case). To make it work with DCPC, all you have to do is to not configure any credentials (leave s3-access-key-id and s3-secret-access-key options unset).

As I said before, without some logs or an error, I'm not able to understand or reproduce the issue, I'm sorry.