mendersoftware / mender-server

Other
2 stars 10 forks source link

mender-client connection issue #112

Open NTohan opened 4 days ago

NTohan commented 4 days ago

With main branch on hash 7dfc4d8501f58142cb0d45595ea3d0163908efa6 mender-client is not able to connect to the mender-server deployed in the production mode.

mender-client connection is tested on the target device using a local IP with mender-server running on the same network

$ wget -O- https://get.mender.io/ | sudo bash -s -- --demo --force-mender-client4 -- --quiet --device-type "genericx86-64" --demo --server-url https://192.168.x.x/ --server-cert=""

Using above command leads to no connection and no new device pending for approval on mender-server running in the same network.

Workaround: Using Mender Server with Cloudflare Reverse Proxy mender-client is able to connect mender-server by disabling https scheme and websecure.

$ git diff
diff --git a/docker-compose.yml b/docker-compose.yml
index b363769..e597262 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -330,13 +330,13 @@ services:
       - --api.insecure=true
       - --accesslog=true
       - --entrypoints.web.address=:80
-      - --entrypoints.web.http.redirections.entryPoint.scheme=https
+      - --entrypoints.web.http.redirections.entryPoint.scheme=http
       - --entrypoints.web.http.redirections.entryPoint.to=websecure
       - --entrypoints.websecure.address=:443
       - --entrypoints.websecure.transport.respondingTimeouts.idleTimeout=7200
       - --entrypoints.websecure.transport.respondingTimeouts.readTimeout=7200
       - --entrypoints.websecure.transport.respondingTimeouts.writeTimeout=7200
-      - --entrypoints.websecure.http.tls=true
+      - --entrypoints.websecure.http.tls=false
       - --providers.file.directory=/etc/traefik/config

With https scheme and websecure.https.tls being disabled in mender-server config, we are relying on the security layer of a CDN service (cloudflare) using secure cloudflare tunnel (Cloudflare reverse proxy). This setup allows us to access mender-server using https://devices.domain.xxx pointing to http://192.168.x.x:**443** and install mender-client on the target device using:

$ wget -O- https://get.mender.io/ | sudo bash -s -- --demo --force-mender-client4 -- --quiet --device-type "genericx86-64" --demo --server-url https://devices.domain.xxx/ --server-cert=""

Is this a recommend workflow to use mender-server or will you recommend to rely on mender-server https and webecure due to security concerns?

Thank you in advance for any inputs.

Cheers, N.T.

alfrunes commented 3 days ago

Hello, Thanks again for feedback. The Docker Compose setup is only meant for demo / evaluation purposes only and should not be used for production environments. When going to production I highly suggest using Kubernetes with our helm chart https://github.com/mendersoftware/mender-helm.

If you still want to use the docker compose setup, https://github.com/mendersoftware/mender-server/pull/110 introduces a (self-signed) demo certificate. You can use that as a basis and issue your own public certificate and mount it (for example by replacing the compose/certs/mender.crt and compose/certs/mender.key with your certificate and key respectively).

alfrunes commented 3 days ago

Another option, if you want to use compose behind a reverse proxy handling the TLS termination. You can simply disable the websecure entrypoint and the redirection as follows:

diff --git a/docker-compose.yml b/docker-compose.yml
index b363769..e597262 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -330,13 +330,13 @@ services:
       - --api.insecure=true
       - --accesslog=true
       - --entrypoints.web.address=:80
-      - --entrypoints.web.http.redirections.entryPoint.scheme=https
-      - --entrypoints.web.http.redirections.entryPoint.to=websecure
-      - --entrypoints.websecure.address=:443
-      - --entrypoints.websecure.transport.respondingTimeouts.idleTimeout=7200
-      - --entrypoints.websecure.transport.respondingTimeouts.readTimeout=7200
-      - --entrypoints.websecure.transport.respondingTimeouts.writeTimeout=7200
-      - --entrypoints.websecure.http.tls=true

This way, the mender-server is exposed in plain HTTP (no TLS) on port 80.

NTohan commented 3 days ago

Thank you for your quick response and suggestions. I came across the helm suggestion earlier too, but simply due to limited resources on my server, I went with docker compose. Nevertheless, if you are suggesting that docker compose is/will not be maintained actively like helm, I would seriously consider upgrading my server.

Unfortunately, your suggestion with exposing to plain HTTP did not work for me, I keep getting 404 page not found with http://192.168.x.x:80

diff --git a/docker-compose.yml b/docker-compose.yml
index b363769..c9fdd05 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -330,13 +330,13 @@ services:
       - --api.insecure=true
       - --accesslog=true
       - --entrypoints.web.address=:80
-      - --entrypoints.web.http.redirections.entryPoint.scheme=https
-      - --entrypoints.web.http.redirections.entryPoint.to=websecure
-      - --entrypoints.websecure.address=:443
-      - --entrypoints.websecure.transport.respondingTimeouts.idleTimeout=7200
-      - --entrypoints.websecure.transport.respondingTimeouts.readTimeout=7200
-      - --entrypoints.websecure.transport.respondingTimeouts.writeTimeout=7200
-      - --entrypoints.websecure.http.tls=true
+        #- --entrypoints.web.http.redirections.entryPoint.scheme=https
+        #- --entrypoints.web.http.redirections.entryPoint.to=websecure
+        #- --entrypoints.websecure.address=:443
+        #- --entrypoints.websecure.transport.respondingTimeouts.idleTimeout=7200
+        #- --entrypoints.websecure.transport.respondingTimeouts.readTimeout=7200
+        #- --entrypoints.websecure.transport.respondingTimeouts.writeTimeout=7200
+        #- --entrypoints.websecure.http.tls=true
       - --providers.file.directory=/etc/traefik/config
       - --providers.docker=true
       - --providers.docker.exposedByDefault=false

Also, I will try generating certificates for extra layer of security but behind reverse proxy it is more or less redundant.

alfrunes commented 3 days ago

Sorry I was too quick to apply, not actually testing my suggestion. I identified the problem and addressed it in the PR (https://github.com/mendersoftware/mender-server/pull/110/commits/cf0b9b2175e9cf99c320e623b897f0c6a57137ca). If you now try to remove the websecure entrypoint from Traefik (as I described in my previous comment) it will serve plain HTTP on port 80.

I came across the helm suggestion earlier too, but simply due to limited resources on my server, I went with docker compose. Nevertheless, if you are suggesting that docker compose is/will not be maintained actively like helm, I would seriously consider upgrading my server.

As long as it's not for a production environment. For small deployments you should be fine using this docker compose, but you might run into trouble if you need to scale the number of replicas, or load balance the application across machines. We will continue to use and improve the docker compose environment as we are also using it internally when doing system integration testing.

Also, I will try generating certificates for extra layer of security but behind reverse proxy it is more or less redundant.

As long as your ingress (proxy) is using TLS and the docker composition is not exposed outside your secure LAN you should be ok, but be aware that port 80 will be exposed to your local network.

NTohan commented 3 days ago

Thank you for your fix. I am happy to confirm that with your changes and removing the websecure entrypoint I am able to connect to port 80. It is possible to configure the default port from 80 to something else?

The only changes I have noticed that inventory information under a registered device is empty now. Also, column configuration options under table configuration are reduced to very limited. Could this be related to the recent changes? Not a blocker for me though.

Yes, you are right about the scaling and load balancing features K8s has to offer. I will definitely consider it for scaling.

Regarding TLS, it is enabled by default for reverse proxy in-use.

NTohan commented 2 days ago

Thank you for your fix. I am happy to confirm that with your changes and removing the websecure entrypoint I am able to connect to port 80. It is possible to configure the default port from 80 to something else?

Unfortunately, the deployments are constantly failing with docker compose at https://github.com/mendersoftware/mender-server/commit/cf0b9b2175e9cf99c320e623b897f0c6a57137ca and removing the websecure entrypoint

2024-10-19 20:58:30.714 +0000 UTC warning: Host not found (non-authoritative), try again later: GET https://s3.mender.local/mender/1392ab9b-95f7-43f4-b3f6-a12ed1f94ad2?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=mender%2F20241019%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241019T205829Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22hello-world-container-update.mender%22&response-content-type=application%2Fvnd.mender-artifact&x-id=GetObject&X-Amz-Signature=9d1aa76146d3063870369e64ba1c99580d292a8a5c2c5684ce0d34512a992655: 

Please find the complete logs in the file: deployment-log-80fcbbce-5e16-4c3d-a77b-558031d77b3e-hello-world-container-update-2024-10-19T21_08_32.096Z.log

Additional information: mender-client is installed on the target device using:

wget -O- https://get.mender.io | sudo bash -s -- --demo --force-mender-client4 -- --quiet --device-type "genericx86-64" --demo --server-url https://devices.domain.xxx --server-cert=""

mender-client is configured with the following parameters:

$ cat /etc/mender/mender.conf
{
    "HttpsClient": {},
    "Security": {},
    "Connectivity": {},
    "DeviceTypeFile": "/var/lib/mender/device_type",
    "UpdateControlMapExpirationTimeSeconds": 90,
    "UpdateControlMapBootExpirationTimeSeconds": 45,
    "UpdatePollIntervalSeconds": 5,
    "InventoryPollIntervalSeconds": 5,
    "RetryPollIntervalSeconds": 30,
    "Servers": [
        {
            "ServerURL": "https://devices.domain.xxx"
        }
    ]
}

$ cat /var/lib/mender/device_type
device_type=genericx86-64

Also, I have tried re-installing the mender-client on the target device and re-created the deployment but unfortunately the deployments are still failing.

Can you please check the issue and confirm if it is an issue within the under-development docker-compose or have I configured the mender-server not correctly on my part?

Thank you for your efforts.

alfrunes commented 1 day ago

Sorry, I forgot about the s3 bucket configuration. At this point, I think these customization deserve a docker compose override file to make it easier to explain. I can see two options for making this work.

  1. Use an s3 bucket and configure the server to use this bucket.
    # docker-compose.http.yml
    # docker compose -f docker-compose.yml -f docker-compose.http.yml up -d
    services:
      traefik:
        command:
          - --api=true
          - --api.insecure=true
          - --accesslog=true
          - --entrypoints.web.address=:80
          - --providers.file.directory=/etc/traefik/config
          - --providers.docker=true
          - --providers.docker.exposedByDefault=false
      deployments:
        environment:
          DEPLOYMENTS_PRESIGN_URL_HOSTNAME: "<your gateway domain name>"
          DEPLOYMENTS_PRESIGN_SECRET: "<Generate a random base64 secret, for example: head -c 16 /dev/urandom | base64 -w 0 >"
          DEPLOYMENTS_STORAGE_BUCKET: "<BUCKET_NAME>"
          DEPLOYMENTS_AWS_URI: "<https://BUCKET_NAME.AWS_REGION.amazonaws.com>"
          DEPLOYMENTS_AWS_EXTERNAL_URI: "<https://BUCKET_NAME.AWS_REGION.amazonaws.com>"
          DEPLOYMENTS_AWS_AUTH_KEY: "${AWS_ACCESS_KEY_ID}"
          DEPLOYMENTS_AWS_AUTH_SECRET: "${AWS_SECRET_ACCESS_KEY}"
      s3fs:
        scale: 0

    Where DEPLOYMENTS_AWS_AUTH_KEY and DEPLOYMENTS_AWS_AUTH_SECRET is set to your secret access key for the s3 bucket.

  2. Create a rule in the reverse proxy for forwarding to (with hostname rewrite) s3.mender.local

    # docker compose -f docker-compose.yml -f docker-compose.http.yml up -d
    services:
      traefik:
        command:
          - --api=true
          - --api.insecure=true
          - --accesslog=true
          - --entrypoints.web.address=:80
          - --providers.file.directory=/etc/traefik/config
          - --providers.docker=true
          - --providers.docker.exposedByDefault=false
      deployments:
        environment:
          DEPLOYMENTS_PRESIGN_URL_HOSTNAME: "<your gateway domain name>"
          DEPLOYMENTS_PRESIGN_SECRET: "<Generate a random base64 secret, for example: head -c 16 /dev/urandom | base64 -w 0 >"
          DEPLOYMENTS_AWS_PROXY_URI: "https://<your domain>/mender" # Setup a rule that maps `/mender` to `s3.mender.local` with host header rewrite.

    [!IMPORTANT] If you go with option 2, make sure you using my latest version of the PR (https://github.com/mendersoftware/mender-server/pull/110/commits/bde5c31a7ca8e153683a207b599f6ddbb5cefd5f) and set MENDER_SECRET_ACCESS_KEY environment variable to a secret value as the s3 storage would be exposed with an insecure access key.

It is possible to configure the default port from 80 to something else?

Yes, simply change the port number for the web service in the traefik service, for example:

diff --git a/docker-compose.yml b/docker-compose.yml
index b363769..c9fdd05 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -330,13 +330,13 @@ services:
       - --api.insecure=true
       - --accesslog=true
-      - --entrypoints.web.address=:80
+      - --entrypoints.web.address=:8080
NTohan commented 19 hours ago

Thank you for your patch and summary on possible options. I like your approach with docker compose override file.

I have tried the option 2 and Sorry to report that deployment is not working for me.

Here are the steps Step 1: Your latest changes from https://github.com/mendersoftware/mender-server/commit/bde5c31a7ca8e153683a207b599f6ddbb5cefd5f

$ git log
commit cf0b9b2175e9cf99c320e623b897f0c6a57137ca (HEAD)
Author: Alf-Rune Siqveland <alf.rune@northern.tech>
Date:   Fri Oct 18 15:07:22 2024 +0200

    chore(docker): Remove hard-coded entrypoint in routes and define default

    Signed-off-by: Alf-Rune Siqveland <alf.rune@northern.tech>

Step 2: Setup a rule that maps /mender to s3.mender.local

image

Step 3: Add a new docker compose override file docker-compose.http.yml

$ git diff
$ export MENDER_SECRET_ACCESS_KEY=generated_random_key
$ cat docker-compose.http.yml
services:
  traefik:
    command:
      - --api=true
      - --api.insecure=true
      - --accesslog=true
      - --entrypoints.web.address=:80
      - --providers.file.directory=/etc/traefik/config
      - --providers.docker=true
      - --providers.docker.exposedByDefault=false
  deployments:
    environment:
      DEPLOYMENTS_PRESIGN_URL_HOSTNAME: "devices.domain.xxx"
      DEPLOYMENTS_PRESIGN_SECRET: "generated_random_key" #"<Generate a random base64 secret, for example: head -c 16 /dev/urandom | base64 -w 0 >"
      DEPLOYMENTS_AWS_PROXY_URI: "https://domain.xxx/mender" # Setup a rule that maps `/mender` to `s3.mender.local` with host header rewrite.

$  docker compose -f docker-compose.yml -f docker-compose.http.yml up --build

Step 4: Try a demo deployment

Unfortunately, I keep getting this error on mender-server

Couldn't load deployments. Cannot read properties of undefined (reading 'length') Retrying in 9 seconds...

Please find the logs from _mender-deployments-1_logs (1).txt

Also, how to make sure that s3.mender.local is setup properly on my server? Is it possible to bypass the local domain s3.mender.local and replace it with something like http://<local_ip>:<port> within docker compose? This might be helpful to test if reverse-proxy is able to point to http://<local_ip>:<port> and has issue resolving to the suggested s3.mender.local.

Yes, simply change the port number for the web service in the traefik service, for example:

Thank you for the suggestion. I will try adapting the exposure port after deployments are functional.

alfrunes commented 2 hours ago

I merged the PR to main with one notable change: the domain name changed from mender.local to docker.mender.io, this is to avoid potential conflicts with mDNS top-level domain (.local).

Please find the logs from _mender-deployments-1_logs (1).txt

The only thing that sticks out from the logs here is that it seems like your trying to recreate a deployment that is already in progress. The error observed seems to be coming from an unexpected response that is not handled in the frontend. I'm not sure exactly what's causing this, but I will look into it.

To test if your setup works, you could try uploading an artifact and then download it again.

Also, how to make sure that s3.mender.local is setup properly on my server?

It seems like the problem is that s3.mender.local (now docker.mender.io on main) should map to localhost, so you should add a routing entry mapping this hostname back to your localhost. That is, on Linux you need to edit /etc/hosts: echo "127.0.0.1 s3.mender.local" | sudo tee -a /etc/hosts. On Windows I believe you need to append the route to C:\Windows\System32\drivers\etc\hosts. Alternatively, you could setup a local DNS on your LAN, adding A record mapping the domain back to your private IP.

Is it possible to bypass the local domain s3.mender.local and replace it with something like http://: within docker compose? This might be helpful to test if reverse-proxy is able to point to http://: and has issue resolving to the suggested s3.mender.local.

This is possible, but I would not recommend it as containers get their IPs reassigned every time you restart the docker compose environment. But for the sake of completeness, you can forward the requests to the IP of the s3fs container which you can get by running:

docker inspect mender-s3fs-1 --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'

The S3 API is exposed on port 8333 on this IP address.

And one more thing, just to double check. I hope you've replaced the value of the MENDER_SECRET_ACCESS_KEY before running docker compose up -d. It is important that this is secret as it would otherwise give public access to the blob storage.