ipeterov commented 10 months ago

What do I want?

It's currently possible to add headers after the response has been generated, and it looks like this:

"action": {
    "share": "/www/static/$uri",
    "response_headers": {
        "Cache-Control": "max-age=60, s-maxage=120"
        "CDN-Cache-Control": "max-age=600"
    }
}

I want to add request headers, such as Host and X-Forwarded-Proto that would be passed to my application. I imagine that could look like this:

"action": {
  "pass": "applications/django",
  "request_headers": {
    "Host": "aiarena-test.net",
    "X-Forwarded-Proto": "https"
  }
}

Why do I want that? What's my actual use-case?

In an open-source project I'm helping maintain we're hosting our infrastructure behind an AWS load balancer. Our application code runs inside a docker container, and before the docker container can receive any traffic it must pass a load-balancer health-check.

The problem is that the health checker uses the container's IP address to access the container, and it also uses HTTP instead of HTTPS. So by default I'm just getting a 400 error because my application is configured to only work on a white-list of allowed hosts.

What I want to do is to trick the container into thinking it's being called from behind the load balancer, and that HTTPS was handled properly (as it is for all the production traffic it's going to recceive).

Possible solutions for my problem without request header support

Here are the options I have considered. While they will solve my problem, I don't think those are optimal.

Allow traffic from any hostname, and disable the HTTPS redirect.

This would allow the healthcheck to proceed, but it sacrifices important security features.

Put Unit behind regular nginx

This would allow me to do something like this:

location ~ ^/health-check/?$ {
    include                 /etc/nginx/uwsgi_params;
    uwsgi_param             HTTP_HOST aiarena-test.net;
    uwsgi_param             HTTP_X_FORWARDED_PROTO https;
    uwsgi_pass              aiarena-uwsgi:8311;
}

This solves the problem without creating security flaws, but it forces us to have multiple containers (1 nginx, 1 unit with django app). This is actually the setup I have on my work projects, and it's the one I'm trying to migrate from.

Modify the healthcheck to expect a 301 or a 400

While it's possible to do in AWS, and technically the healthcheck would pass, it would only check that the ALLOWED_HOSTS setting is working, and skip checking that the application itself is functional.

Summary

It would be very neat to modify the request headers before they're passed to the application, and that would allow me to migrate both my hoppy project and my work projects to Nginx Unit.

lcrilly commented 9 months ago

Support for setting X-Forwarded-Proto is available today (it's part of the listener configuration):

https://unit.nginx.org/configuration/#ip-protocol-forwarding

tippexs commented 9 months ago

Hi @ipeterov Thanks for reaching out and thanks @lcrilly for jumping in!

Beside X-Forwarded-Proto and X-Forwarded-For adding custom request headers is currently not natively supported. BUT you are able to set environment variables in the environment section.

Then, a next step - you can make use of PHPs auto-prepend-file). In this script you will be able to convert Env to Http-Request-Headers. Does this make sense? I have used this in another old application with some great success. The performance implication is almost nothing (of course, depends on what are you doing in the prepended file).

txigreman commented 6 months ago

Modify the healthcheck to expect a 301 or a 400

While it's possible to do in AWS, and technically the healthcheck would pass, it would only check that the ALLOWED_HOSTS setting is working, and skip checking that the application itself is functional.

The ALB health check should only check if the "service" itself is functional, not the whole app. In other words: ALB, with its health check, just want to know if the attached target is able to handle requests, not it your app is running properly.

In case of health check error, ALB will try to kill and redeploy the target. For example, if your DB is down, and you reply a 500 error to the health check, the ALB will enter in a kill/redeploy cycle that it will only make things worse. That's why health check should only fail when a redeploy will fix the problem.

It's ok to have a dedicated file or route in your app for the health check that does nothing but reply "I'm alive" with a 200 status code.

To monitor if your whole app is working properly, you could use Route53 health checks (or other uptime monitor services), cloudwatch alarms, or other mechanisms.

ipeterov commented 6 months ago

Sorry @tippexs, I missed the notification. I'm using python+Django as my backend, so I don't have access to auto-prepend-file. Thanks for the suggestion, I'll look into the alternatives.

ipeterov commented 6 months ago

@txigreman I definitely see your point. But consider this situation - I misconfigured the access credentials to my database in my ECS task definition.

The new containers are failing on every page, but my database is alive and well. When ALB does the health-check, it will get the expected 400 code, because the ALLOWED_HOSTS check happens really early and doesn't try to access the database.

My current setup will be able to catch the issue and just keep the old containers, and the proposed setup will lead to a half-hour downtime (5 minutes for uptime monitoring, 25 minutes for us to realize what happened and deploy a fix).

It doesn't have to be misconfigured DB credentials - anything that happens after the ALLOWED_HOSTS check. I understand that it's probably not going to be a very frequent problem, but over the course of a few years, it will prevent a couple of downtimes.

txigreman commented 6 months ago

Hi @ipeterov,

Errors on new releases should be handled using cloudwatch alarms, in conjuntion with blue/green deploments (or the deployment circuit breaker in ECS). This way you can also handle other scenarios, like performance degradations.

ALB health check is not intented to be used for detecting code/config related errors on new releases.

ipeterov commented 6 months ago

@txigreman This is probably off-topic for this issue, but I'm really interested in your perspective 😅

In my limited understanding, the things you mentioned don't really solve this problem:

CloudWatch alarms - not entirely sure which ones you meant:
- RDS load alarms can notify me if the DB is not responsive, but do nothing for a misconfiguration
- I can set up alarms for ALB metrics such as "HTTP 500s" or "Target Response Time", but those will notify me after the downtime has happened, which is not ideal
Blue/green deployments - I currently have blue-green deployments (with ECS and CodeDeploy), and they are using this exact health-check to figure out if they should switch the traffic over to the new container set.
Uptime monitoring services (Route53, uptime.com, or whatever) will only let me know after the downtime has happened.

To be clear, I'm not saying that this particular health-check should be the only thing I rely on. In production, I have CloudWatch metrics for the DB, application performance monitoring and uptime monitoring in addition to the health-check.

What I am trying to say is that having the ALB health-check that checks DB connectivity can prevent some of the downtimes that aren't easily preventable with other means, and that's a good enough reason to want it.

A downside of this approach is that if the DB is actually down, I will have get a restart loop because the health-check will fail. But from my perspective it's a decent trade-off because in that case the application is down anyway.

txigreman commented 6 months ago

In my case, the misconfigured db connection would be detected in the pipeline, when trying to execute the DB migrations.

After uploading new Docker images, and publishing new task definitions, the pipeline runs a one-off task in the ECS cluster to execute the db migrations, and then checks the task exit code to continue or stop itself. I have an specific task definition for it, but it would also work to use the main app task definition overriding the docker image command.

(I use aws ecs run-task to run the task, ecs wait tasks-stopped to wait until execution finished, and aws ecs describe-tasks to check the taks exit code and stop reason, but I'm not using AWS CodeDeploy).

If you have some other static stuff to validate maybe you should consider lunching a similar one-off task in your pipeline to run some validations before updating and redeploying your services.

Another option, if your app logs are stored in cloudwatch. would be to create a log subscription filter, and trigger an alarm based on it. As it is not based on metrics, it will be triggered inmediatelly.

My general approach would be to validate static things in the pipeline, and dynamic stuff with alarms.

tippexs commented 6 months ago

Wow! Great to see some OT-discussion in regards to NGINX Unit as well.

I am at KubeCon this week BUT! adding new headers to the request context seems like a natural fit for Unit. As already mentioned today this will work with environment variables but not with headers. There will be some special case in terms of adding new request headers to the chain. So we have to write some logic what headers can't! be added if a specific header is present or if we should overwrite an existing header if configured to do so. We can not overwrite headers per default as it is totally valid and allowed to have multiple headers of the same name.

As we are already having some issues covering this one wrapper issue would be good to track the development of this.

nginx / unit

Adjusting the protocol and host when deployed behind a load balancer #1021

What do I want?

Why do I want that? What's my actual use-case?

Possible solutions for my problem without request header support

Allow traffic from any hostname, and disable the HTTPS redirect.

Put Unit behind regular nginx

Modify the healthcheck to expect a 301 or a 400

Summary

Modify the healthcheck to expect a 301 or a 400