pomerium / ingress-controller

Pomerium Kubernetes Ingress Controller
https://pomerium.com
Apache License 2.0
23 stars 11 forks source link

sessions get dropped seconds after page loads; cors error prevents sign_in from working #682

Closed drewp closed 10 months ago

drewp commented 1 year ago

What happened?

I load the non-public page https://bigasterisk.com/m/vmui/ and quickly click a button that does a fetch request. This succeeds. Then I count to 10, click it again, and it fails. 100% repeatable on multiple apps. Logs below.

What did you expect to happen?

I expected the session to not be lost so quickly, but if by chance it does expire, I expect the sign_in flow to work transparently. And if that doesn't work, I'd like a more actionable error. I am still confused by these in particular:

BTW both those lines have no 'service' field, which makes them harder to put in context.

What's your environment like?

k8s, near-default pomerium ingress from pomerium/ingress-controller:sha-dd49d67

config patches here

global Pomerium spec here (tldr it's google oidc).

Using postgres for storage.

What did you see in the logs?

The good fetch request:

{
  "level": "info",
  "type": "type.googleapis.com/session.Session",
  "query": "",
  "offset": 0,
  "limit": 1,
  "filter": {
    "$or": [
      {
        "id": "443c3dc3-###-40ea4e117f9e"
      },
      {
        "$index": "443c3dc3-###-40ea4e117f9e"
      }
    ]
  },
  "time": "2023-06-17T06:28:19Z",
  "message": "query"
}
{
  "level": "info",
  "service": "authorize",
  "request-id": "2df3eed0-8af8-4f55-8eb9-f62178dbe19e",
  "check-request-id": "2df3eed0-8af8-4f55-8eb9-f62178dbe19e",
  "method": "GET",
  "path": "/m/prometheus/api/v1/query_range",
  "host": "bigasterisk.com",
  "query": "",
  "ip": "10.42.1.1",
  "session-id": "443c3dc3-###-40ea4e117f9e",
  "allow": true,
  "allow-why-true": [
    "email-ok"
  ],
  "deny": false,
  "deny-why-false": [
    "valid-client-certificate-or-none-required"
  ],
  "user": "####",
  "email": "#####",
  "time": "2023-06-17T06:28:19Z",
  "message": "authorize check"
}
{
  "level": "info",
  "service": "envoy",
  "upstream-cluster": "default-victoriametrics-bigasterisk-com-m-3e2524366b1bdea6",
  "method": "GET",
  "authority": "bigasterisk.com",
  "path": "/m/prometheus/api/v1/query_range",
  "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
  "referer": "https://bigasterisk.com/m/vmui/",
  "forwarded-for": "10.42.1.1",
  "request-id": "2df3eed0-8af8-4f55-8eb9-f62178dbe19e",
  "duration": 12.8336,
  "size": 3732,
  "response-code": 200,
  "response-code-details": "via_upstream",
  "time": "2023-06-17T06:28:20Z",
  "message": "http-request"
}

The broken fetch request:

{
  "level": "info",
  "type": "type.googleapis.com/session.Session",
  "query": "",
  "offset": 0,
  "limit": 1,
  "filter": {
    "$or": [
      {
        "id": "443c3dc3-###-40ea4e117f9e"
      },
      {
        "$index": "443c3dc3-###-40ea4e117f9e"
      }
    ]
  },
  "time": "2023-06-17T06:29:24Z",
  "message": "query"
}
{
  "level": "info",
  "type": "type.googleapis.com/user.ServiceAccount",
  "query": "",
  "offset": 0,
  "limit": 1,
  "filter": {
    "$or": [
      {
        "id": "443c3dc3-###-40ea4e117f9e"
      },
      {
        "$index": "443c3dc3-###-40ea4e117f9e"
      }
    ]
  },
  "time": "2023-06-17T06:29:24Z",
  "message": "query"
}
{
  "level": "warn",
  "error": "record not found",
  "time": "2023-06-17T06:29:24Z",
  "message": "clearing session due to missing session or service account"
}
{
  "level": "info",
  "type": "type.googleapis.com/user.ServiceAccount",
  "query": "",
  "offset": 0,
  "limit": 1,
  "filter": {
    "$or": [
      {
        "id": ""
      },
      {
        "$index": ""
      }
    ]
  },
  "time": "2023-06-17T06:29:24Z",
  "message": "query"
}
{
  "level": "info",
  "type": "type.googleapis.com/session.Session",
  "query": "",
  "offset": 0,
  "limit": 1,
  "filter": {
    "$or": [
      {
        "id": ""
      },
      {
        "$index": ""
      }
    ]
  },
  "time": "2023-06-17T06:29:24Z",
  "message": "query"
}
{
  "level": "info",
  "service": "authorize",
  "request-id": "4a1fc7dd-a987-4476-b4a9-19a88af6f7f2",
  "check-request-id": "4a1fc7dd-a987-4476-b4a9-19a88af6f7f2",
  "method": "GET",
  "path": "/.well-known/pomerium/hpke-public-key",
  "host": "localhost:8443",
  "query": "",
  "ip": "::1",
  "allow": true,
  "allow-why-true": [
    "pomerium-route"
  ],
  "deny": false,
  "deny-why-false": [],
  "user": "",
  "email": "",
  "time": "2023-06-17T06:29:24Z",
  "message": "authorize check"
}
{
  "level": "info",
  "service": "authorize",
  "request-id": "f2b460f5-019d-49c8-a06b-c517bd773337",
  "check-request-id": "f2b460f5-019d-49c8-a06b-c517bd773337",
  "method": "GET",
  "path": "/m/prometheus/api/v1/query_range",
  "host": "bigasterisk.com",
  "query": "",
  "ip": "10.42.1.1",
  "allow": false,
  "allow-why-false": [
    "user-unauthenticated"
  ],
  "deny": false,
  "deny-why-false": [
    "valid-client-certificate-or-none-required"
  ],
  "user": "",
  "email": "",
  "time": "2023-06-17T06:29:24Z",
  "message": "authorize check"
}
{
  "level": "info",
  "service": "authorize",
  "request-id": "21b96fdc-fbd2-4459-b4ba-9b65341f841a",
  "check-request-id": "21b96fdc-fbd2-4459-b4ba-9b65341f841a",
  "method": "GET",
  "path": "/.pomerium/sign_in",
  "host": "authenticate.bigasterisk.com",
  "query": "",
  "ip": "10.42.1.1",
  "allow": true,
  "allow-why-true": [
    "pomerium-route"
  ],
  "deny": false,
  "deny-why-false": [],
  "user": "",
  "email": "",
  "time": "2023-06-17T06:29:24Z",
  "message": "authorize check"
}
{
  "level": "info",
  "ip": "127.0.0.1",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
  "referer": "https://bigasterisk.com/",
  "request-id": "21b96fdc-fbd2-4459-b4ba-9b65341f841a",
  "error": "internal/urlutil: malformed unix timestamp field",
  "time": "2023-06-17T06:29:24Z",
  "message": "authenticate: origin blocked"
}
{
  "level": "info",
  "ip": "127.0.0.1",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
  "referer": "https://bigasterisk.com/",
  "request-id": "21b96fdc-fbd2-4459-b4ba-9b65341f841a",
  "error": "Bad Request: internal/sessions: session is not found",
  "idp_id": "Hw5Jnp#######",
  "time": "2023-06-17T06:29:24Z",
  "message": "authenticate: session load error"
}
{
  "level": "info",
  "service": "envoy",
  "upstream-cluster": "",
  "method": "GET",
  "authority": "bigasterisk.com",
  "path": "/m/prometheus/api/v1/query_range",
  "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
  "referer": "https://bigasterisk.com/m/vmui/",
  "forwarded-for": "10.42.1.1",
  "request-id": "f2b460f5-019d-49c8-a06b-c517bd773337",
  "duration": 15.752359,
  "size": 1438,
  "response-code": 302,
  "response-code-details": "ext_authz_denied",
  "time": "2023-06-17T06:29:25Z",
  "message": "http-request"
}
{
  "level": "info",
  "service": "envoy",
  "upstream-cluster": "pomerium-control-plane-http",
  "method": "GET",
  "authority": "authenticate.bigasterisk.com",
  "path": "/.pomerium/sign_in",
  "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36",
  "referer": "https://bigasterisk.com/",
  "forwarded-for": "10.42.1.1",
  "request-id": "21b96fdc-fbd2-4459-b4ba-9b65341f841a",
  "duration": 10.677448,
  "size": 908,
  "response-code": 302,
  "response-code-details": "via_upstream",
  "time": "2023-06-17T06:29:25Z",
  "message": "http-request"
}
{
  "level": "info",
  "service": "envoy",
  "upstream-cluster": "pomerium-control-plane-http",
  "method": "GET",
  "authority": "localhost:8443",
  "path": "/.well-known/pomerium/hpke-public-key",
  "user-agent": "Go-http-client/2.0",
  "referer": "",
  "forwarded-for": "10.42.1.5",
  "request-id": "4a1fc7dd-a987-4476-b4a9-19a88af6f7f2",
  "duration": 2.690773,
  "size": 0,
  "response-code": 304,
  "response-code-details": "via_upstream",
  "time": "2023-06-17T06:29:25Z",
  "message": "http-request"
}

Chrome console, on the bad request:

Access to fetch at 'https://authenticate.bigasterisk.com/.pomerium/sign_in?k=######' (redirected from 'https://bigasterisk.com/m/prometheus/api/v1/query_range?query=house_power_w&start=1686972564.235&end=1686983364.235&step=15s') from origin 'https://bigasterisk.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
main.ff1e4560.js:2 
GET https://authenticate.bigasterisk.com/.pomerium/sign_in?k=###### net::ERR_FAILED 302
Fetch failed loading: GET "https://authenticate.bigasterisk.com/.pomerium/sign_in?k=######".
drewp commented 1 year ago

Here's the pomerium log for the page load, which always succeeds for me. Apparently I'm often doing 3 redirects at page load, which surely doesn't help performance. If I press refresh fast enough, there's no redirect.

https://bigasterisk.com/post/pomerium-log1.txt

wasaga commented 1 year ago

You should check your Postgres user whether it has rights to update the database.

What we are witnessing is Pomerium authenticating you, providing you with session that is immediately cached in memory that allows you to proceed with your login, while database persistence happens in parallel; first few requests manage to pass thru because the cache is still valid.

Normally, there would be the following log line that would confirm the session object is persisted in the database

{"level":"info","record-type":"type.googleapis.com/session.Session","record-id":"0695b148-397f-4ccc-9b27-73be447efc6e","time":"2023-06-19T11:37:57-04:00","message":"put"}

I do not see such logs in your logs. What happens is that after some (short) time, cache expires and database doesn't have the session, forcing you go thru authenticate endpoint to acquire a new one once again. This process kinda allows you to work in the browser, but as you correctly pointed, CORS issues won't allow this to happen for the fetch requests.

We need understand what is exactly going on with your database. Please advise on your database flavour and version.

Please try to connect to your database and run

SELECT * from pomerium.records

you should see a record with type == type.googleapis.com/session.Session corresponding to your session. Most likely it won't be there.

Please check whether the database role has all the required permissions to select, update and create tables.

You may also temporarily remove the storage from configuration, so that it would only use in-memory storage. Most likely it'll resolve the issue, proving the root cause is something with database connection and/or database user permissions.

drewp commented 1 year ago

Removing storage does seem to fix the quick timeout.

OOC why doesn't authenticate.bigasterisk.com return Access-Control-Allow-Origin: https://bigasterisk.com in these cases?

My postgres config is just this default:

 - name: pomerium-db
          image: postgres:14.2-alpine3.15
          env:
            - { name: POSTGRES_PASSWORD, value: admin }
            - { name: POSTGRES_DB, value: pomerium }
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: pomerium-db-data
              mountPath: /var/lib/postgresql/data

then this connection secret:

apiVersion: v1
kind: Secret
metadata:
  namespace: pomerium
  name: postgres-connection-key
type: Opaque
stringData:
  connection: postgresql://pom:pom@pomerium-db/pomerium

It seems to work:

k exec -it -n pomerium deploy/pomerium-db -- psql --dbname=pomerium --username=pom -W
Password: pom
psql (14.2)
Type "help" for help.

pomerium=> select count(*) from pomerium.records;
 count 
-------
    10
(1 row)

Grants look ok to me:

pomerium=> SELECT * FROM information_schema.role_table_grants where table_name='records';
 grantor | grantee | table_catalog | table_schema | table_name | privilege_type | is_grantable | with_hierarchy 
---------+---------+---------------+--------------+------------+----------------+--------------+----------------
 pom     | pom     | pomerium      | pomerium     | records    | INSERT         | YES          | NO
 pom     | pom     | pomerium      | pomerium     | records    | SELECT         | YES          | YES
 pom     | pom     | pomerium      | pomerium     | records    | UPDATE         | YES          | NO
 pom     | pom     | pomerium      | pomerium     | records    | DELETE         | YES          | NO
 pom     | pom     | pomerium      | pomerium     | records    | TRUNCATE       | YES          | NO
 pom     | pom     | pomerium      | pomerium     | records    | REFERENCES     | YES          | NO
 pom     | pom     | pomerium      | pomerium     | records    | TRIGGER        | YES          | NO
(7 rows)

Dumping all the records columns is a mess, but I can see current dates like this:

pomerium=> select type,id,version,modified_at from pomerium.records;
                     type                      |                      id                      | version |          modified_at          
-----------------------------------------------+----------------------------------------------+---------+-------------------------------
 type.googleapis.com/user.User                 | ...                        |    5361 | 2023-04-15 17:27:56.843461+00
 type.googleapis.com/pomerium.config.Config    | pomerium-crd                                 |   28440 | 2023-06-09 02:03:44.367036+00
 type.googleapis.com/pomerium.events.LastError | identity_manager_last_user_refresh_errors    |    8466 | 2023-04-19 20:23:18.918342+00
 type.googleapis.com/pomerium.config.Config    | ingress-controller                           |   31301 | 2023-06-19 15:46:48.402831+00
 type.googleapis.com/user.User                 | ...                        |   19996 | 2023-05-08 22:36:00.971556+00
 type.googleapis.com/user.User                 | ...                        |   25963 | 2023-05-28 04:26:37.422393+00
 type.googleapis.com/pomerium.events.LastError | identity_manager_last_session_refresh_errors |   31333 | 2023-06-19 22:40:27.633805+00
 type.googleapis.com/user.User                 | ...                        |   31348 | 2023-06-20 00:18:17.812008+00
 type.googleapis.com/session.Session           | 9ddcc2cf-5955-4ab9-bb13-67142f4b0870         |   31349 | 2023-06-20 00:19:57.621045+00
 type.googleapis.com/user.User                 | ...                        |   27207 | 2023-06-02 22:34:35.890335+00
(10 rows)

My read of all this is that some db writes are working, but some reads aren't?

wasaga commented 1 year ago

could you please check that all components have sufficient resources (database, pomerium)?

drewp commented 1 year ago

I was experiencing OOM on the pomerium deployment, so I have its limit turned up to 5GB (!). The mem usage swings around pretty wildly, but i've been trying various experiments, so this alone isn't the problem: image

The postgres container has no limit, and its host and disk are ok. Though it's presumably not being used at the moment, and I still have sessions timing out too fast.

Let's leave postgres out and try to get one pomerium in one pod to hold a session in memory. Currently I'm getting the same failures as before:

pomlog | egrep "error|message|details"

(do fetch request)

  "message": "query"
  "error": "record not found",
  "message": "clearing session due to missing session or service account"
  "message": "query"
  "message": "query"
  "message": "authorize check"
  "response-code-details": "via_upstream",
  "message": "http-request"
  "error": "unable to write OCSP staple file for [bigasterisk.com]: mkdir /.local: read-only file system"
  ...and every other domain...
  "message": "query"
  "message": "query"
  "error": "record not found",
  "message": "clearing session due to missing session or service account"
  "message": "query"
  "message": "query"
  "message": "authorize check"
  "message": "authorize check"
  "message": "authorize check"
  "error": "internal/urlutil: malformed unix timestamp field",
  "message": "authenticate: origin blocked"
  "error": "Bad Request: internal/sessions: session is not found",
  "message": "authenticate: session load error"
wasaga commented 1 year ago

V0.22.x has significant reduction in memory usage.

github.com/pomerium/ingress-controller/config/default?ref=v0.22.1

drewp commented 1 year ago

I moved to a local build of 0.22.2, hoping to be able to add some logging around sessions. I can confirm the 0.22.2 mem usage improvement: instead of 800MB, it seems to use 300MB.

However, I need to localize more code than ingress-controller to be able to edit the session code, so I have no new insights yet. Also, running 1 instance sans DB seemed to help session lifetime a bit, but I still experience the original symptom all day long.

Here's a fresh failure log:


{"time": "2023-06-20T23:24:42Z", "level": "info", "type": "type.googleapis.com/session.Session", "query": "", "offset": 0, "limit": 1, "filter": {"$or": [{"id": "dbcec3af"}, {"$index": "dbcec3af"}]}, "message": "query"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "type": "type.googleapis.com/user.ServiceAccount", "query": "", "offset": 0, "limit": 1, "filter": {"$or": [{"id": "dbcec3af"}, {"$index": "dbcec3af"}]}, "message": "query"}
{"time": "2023-06-20T23:24:42Z", "level": "warn", "error": "record not found", "message": "clearing session due to missing session or service account"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "type": "type.googleapis.com/user.ServiceAccount", "query": "", "offset": 0, "limit": 1, "filter": {"$or": [{"id": ""}, {"$index": ""}]}, "message": "query"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "type": "type.googleapis.com/session.Session", "query": "", "offset": 0, "limit": 1, "filter": {"$or": [{"id": ""}, {"$index": ""}]}, "message": "query"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "service": "authorize", "request-id": "9e4c3b70", "check-request-id": "9e4c3b70", "method": "GET", "path": "/.well-known/pomerium/hpke-public-key", "host": "localhost:8443", "query": "", "ip": "::1", "allow": true, "allow-why-true": ["pomerium-route"], "deny": false, "deny-why-false": [], "user": "", "email": "", "message": "authorize check"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "service": "authorize", "request-id": "a5ea2c96", "check-request-id": "a5ea2c96", "method": "GET", "path": "/m/prometheus/api/v1/query_range", "host": "bigasterisk.com", "query": "", "ip": "10.42.1.0", "allow": false, "allow-why-false": ["user-unauthenticated"], "deny": false, "deny-why-false": ["valid-client-certificate-or-none-required"], "user": "", "email": "", "message": "authorize check"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "service": "authorize", "request-id": "8e8110de", "check-request-id": "8e8110de", "method": "GET", "path": "/.pomerium/sign_in", "host": "authenticate.bigasterisk.com", "query": "", "ip": "10.42.1.0", "allow": true, "allow-why-true": ["pomerium-route"], "deny": false, "deny-why-false": [], "user": "", "email": "", "message": "authorize check"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "ip": "127.0.0.1", "user_agent": "Mozilla/5.0", "referer": "https://bigasterisk.com/", "request-id": "8e8110de", "error": "internal/urlutil: malformed unix timestamp field", "message": "authenticate: origin blocked"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "ip": "127.0.0.1", "user_agent": "Mozilla/5.0", "referer": "https://bigasterisk.com/", "request-id": "8e8110de", "error": "Bad Request: internal/sessions: session is not found", "idp_id": "Hw5J", "message": "authenticate: session load error"} 
{"time": "2023-06-20T23:24:42Z", "level": "info", "service": "envoy", "upstream-cluster": "", "method": "GET", "authority": "bigasterisk.com", "path": "/m/prometheus/api/v1/query_range", "user-agent": "Mozilla/5.0", "referer": "https://bigasterisk.com/m/vmui/", "forwarded-for": "10.42.1.0", "request-id": "a5ea2c96", "duration": 23.614978, "size": 1438, "response-code": 302, "response-code-details": "ext_authz_denied", "message": "http-request"} 
{"time": "2023-06-20T23:24:43Z", "level": "info", "service": "envoy", "upstream-cluster": "pomerium-control-plane-http", "method": "GET", "authority": "localhost:8443", "path": "/.well-known/pomerium/hpke-public-key", "user-agent": "Go-http-client/2.0", "referer": "", "forwarded-for": "10.42.3.167", "request-id": "9e4c3b70", "duration": 9.973957, "size": 0, "response-code": 304, "response-code-details": "via_upstream", "message": "http-request"} 
{"time": "2023-06-20T23:24:43Z", "level": "info", "service": "envoy", "upstream-cluster": "pomerium-control-plane-http", "method": "GET", "authority": "authenticate.bigasterisk.com", "path": "/.pomerium/sign_in", "user-agent": "Mozilla/5.0", "referer": "https://bigasterisk.com/", "forwarded-for": "10.42.1.0", "request-id": "8e8110de", "duration": 61.309156, "size": 1245, "response-code": 302, "response-code-details": "via_upstream", "message": "http-request"}

If I could instrument the session code, I guess I'd dump what sessions DO exist at the time of 'record not found'. Is the store timing them out too fast, or am I accidentally getting a new session id when I do my fetch() request?

Also, what are log lines 4-5? What's the point of getting a session with id==""?

wasaga commented 1 year ago

please try to look thru your error logs. I'm surprised we do not see any kind of database errors or something. maybe they're shifted in time

drewp commented 1 year ago

There shouldn't be db errors since I'm using default (mem) storage for now.

But, I can't even get pom to run now because I can't untangle the startup sequence. This only worked with lots of thrashing around, before.

I was able to produce this in a log:

2023-06-21T13:38:50.294020665-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:38:50Z","msg":"fetch","controller":"bootstrap pod/pomerium-78f78b664-kfvbf","controllerGroup":"ingress.pomerium.io","controllerKind":"Pomerium","Pomerium":{"name":"global"},"namespace":"","name":"global","reconcileID":"4be4f360-06e1-4b51-b99a-46508dc4eaab","deps":[{"Kind":"Secret","Namespace":"pomerium","Name":"pomerium-proxy-tls"},{"Kind":"Secret","Namespace":"pomerium","Name":"bootstrap"},{"
Kind":"Secret","Namespace":"pomerium","Name":"idp"}],"error":"certs: get pomerium/pomerium-proxy-tls: Secret \"pomerium-proxy-tls\" not found"}
2023-06-21T13:38:50.294979684-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:38:50Z","logger":"events","msg":"certs: get pomerium/pomerium-proxy-tls: Secret \"pomerium-proxy-tls\" not found","type":"Normal","object":{"kind":"Pomerium","name":"global","uid":"...","apiVersion":"ingress.pomerium.io/v1","resourceVersion":"17088400"},"reason":"UpdateError"}
2023-06-21T13:38:50.323795289-07:00 stderr F {"level":"error","ts":"2023-06-21T20:38:50Z","msg":"config rejected","controller":"bootstrap pod/pomerium-78f78b664-kfvbf","controllerGroup":"ingress.pomerium.io","controllerKind":"Pomerium","Pomerium":{"name":"global"},"namespace":"","name":"global","reconcileID":"4be4f360-06e1-4b51-b99a-46508dc4eaab","error":"certs: get pomerium/pomerium-proxy-tls: Secret \"pomerium-proxy-tls\" not found"}
2023-06-21T13:38:50.323819181-07:00 stderr F {"level":"error","ts":"2023-06-21T20:38:50Z","msg":"Reconciler error","controller":"bootstrap pod/pomerium-78f78b664-kfvbf","controllerGroup":"ingress.pomerium.io","controllerKind":"Pomerium","Pomerium":{"name":"global"},"namespace":"","name":"global","reconcileID":"4be4f360-06e1-4b51-b99a-46508dc4eaab","error":"get settings: certs: get pomerium/pomerium-proxy-tls: Secret \"pomerium-proxy-tls\" not found"}
...
2023-06-21T13:42:21.325118498-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:42:21Z","msg":"fetch","controller":"bootstrap pod/pomerium-78f78b664-kfvbf","controllerGroup":"ingress.pomerium.io","controllerKind":"Pomerium","Pomerium":{"name":"global"},"namespace":"","name":"global","reconcileID":"76680faf-c7a7-4416-ac52-e47c6ee2ade0","deps":[{"Kind":"Secret","Namespace":"pomerium","Name":"bootstrap"},{"Kind":"Secret","Namespace":"pomerium","Name":"idp"}],"error":null}
2023-06-21T13:42:21.325312548-07:00 stdout F {"level":"warn","time":"2023-06-21T20:42:21Z","message":"neither `autocert`, `insecure_server` or manually provided certificates were provided, server will be using a self-signed certificate"}
2023-06-21T13:42:21.325527354-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:42:21Z","logger":"config_restarter","msg":"canceling and waiting for previous task to finish..."}
2023-06-21T13:42:21.325777135-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:42:21Z","logger":"events","msg":"config updated","type":"Normal","object":{"kind":"Pomerium","name":"global","uid":"3d587b0c-e887-44fa-811f-9a5504a27c6a","apiVersion":"ingress.pomerium.io/v1","resourceVersion":"17088678"},"reason":"Updated"}
2023-06-21T13:42:21.327692474-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:42:21Z","logger":"config_restarter","msg":"restarting new task","config":{"Options":{"InstallationID":"","Debug":false,"LogLevel":"info","ProxyLogLevel":"","SharedKey":"...+1h1w+faqDE=","SharedSecretFile":"","Services":"all","Addr":":8443","InsecureServer":false,"DNSLookupFamily":"","CertificateFiles":null,"Cert":"","Key":"","CertFile":"","KeyFile":"","HTTPRedirect
Addr":":8080","ReadTimeout":30000000000,"WriteTimeout":0,"IdleTimeout":300000000000,"Policies":null,"PolicyFile":"","Routes":null,"AdditionalPolicies":null,"AuthenticateURLString":"https://authenticate.pomerium.app","AuthenticateInternalURLString":"https://localhost:8443","SignOutRedirectURLString":"","AuthenticateCallbackPath":"/oauth2/callback","CookieName":"_pomerium","CookieSecret":"LxcustlnR/...=","CookieSecretFile":"","CookieDomain":"","C
ookieSecure":true,"CookieHTTPOnly":true,"CookieExpire":50400000000000,"ClientID":"","ClientSecret":"","ClientSecretFile":"","Provider":"","ProviderURL":"","Scopes":null,"RequestParams":null,"AuthorizeURLString":"","AuthorizeURLStrings":null,"AuthorizeInternalURLString":"","OverrideCertificateName":"","CA":"","CAFile":"","DeriveInternalDomainCert":null,"SigningKey":"...
...=","SigningKeyFile":"","HeadersEnv":"","SetResponseHeaders":null,"JWTClaimsHeaders":null,"DefaultUpstreamTimeout":30000000000,"MetricsAddr":"10.42.3.254:9090","MetricsBasicAuth":"","MetricsCertificate":"","MetricsCertificateKey":"","MetricsCertificateFile":"","MetricsCertificateKeyFil
e":"","MetricsClientCA":"","MetricsClientCAFile":"","TracingProvider":"","TracingSampleRate":0.0001,"TracingDatadogAddress":"","TracingJaegerCollectorEndpoint":"","TracingJaegerAgentEndpoint":"","ZipkinEndpoint":"","GRPCAddr":":443","GRPCInsecure":null,"GRPCClientTimeout":10000000000,"GRPCClientDNSRoundRobin":true,"DataBrokerURLString":"","DataBrokerURLStrings":null,"DataBrokerInternalURLString":"","DataBrokerStorageType":"memory","DataBrokerStorageConnectionString":"","Dat
aBrokerStorageCertFile":"","DataBrokerStorageCertKeyFile":"","DataBrokerStorageCAFile":"","DataBrokerStorageCertSkipVerify":false,"ClientCA":"","ClientCAFile":"","ClientCRL":"","ClientCRLFile":"","GoogleCloudServerlessAuthenticationServiceAccount":"","Enable":false,"Email":"","UseStaging":false,"EABKeyID":"","EABMACKey":"","MustStaple":false,"Folder":"/.local/share/pomerium","TrustedCA":"","TrustedCAFile":"","EnvoyAdminAccessLogPath":"/dev/null","EnvoyAdminProfilePath":"/de
v/null","EnvoyAdminAddress":"","EnvoyBindConfigSourceAddress":"","EnvoyBindConfigFreebind":null,"programmatic_redirect_domain_whitelist":["localhost"],"CodecType":"","AuditKey":null,"BrandingOptions":null},"AutoCertificates":null,"EnvoyVersion":"","DerivedCertificates":null,"DerivedCAPEM":null,"GRPCPort":"37583","HTTPPort":"41593","OutboundPort":"39891","MetricsPort":"39949","DebugPort":"41593","ACMETLSALPNPort":"35825","MetricsScrapeEndpoints":[{"Name":"bootstrap","URL":{"
Scheme":"http","Opaque":"","User":null,"Host":"localhost:45845","Path":"/metrics","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Labels":{"service":"bootstrap-controller"}},{"Name":"ingress","URL":{"Scheme":"http","Opaque":"","User":null,"Host":"localhost:40757","Path":"/metrics","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Labels":{"service":"ingress-controller"}}]}}
2023-06-21T13:42:21.327766358-07:00 stdout F {"level":"info","envoy_version":"1.25.5+b1095c058415dfb2261e695a0f144311a7dc346b6eb47ecbb0a01b7de2c7299f","version":"v0.0.0","time":"2023-06-21T20:42:21Z","message":"cmd/pomerium"}
2023-06-21T13:42:21.327784359-07:00 stderr F {"level":"debug","ts":"2023-06-21T20:42:21Z","msg":"got bootstrap config, starting pomerium...","cfg":{"Options":{"InstallationID":"","Debug":false,"LogLevel":"info","ProxyLogLevel":"","SharedKey":"...+1...","SharedSecretFile":"","Services":"all","Addr":":8443","InsecureServer":false,"DNSLookupFamily":"","CertificateFiles":null,"Cert":"","Key":"","CertFile":"","KeyFile":"","HTTPRedirectAddr":":
8080","ReadTimeout":30000000000,"WriteTimeout":0,"IdleTimeout":300000000000,"Policies":null,"PolicyFile":"","Routes":null,"AdditionalPolicies":null,"AuthenticateURLString":"https://authenticate.pomerium.app","AuthenticateInternalURLString":"https://localhost:8443","SignOutRedirectURLString":"","AuthenticateCallbackPath":"/oauth2/callback","CookieName":"_pomerium","CookieSecret":"LxcustlnR/...=","CookieSecretFile":"","CookieDomain":"","CookieSec
ure":true,"CookieHTTPOnly":true,"CookieExpire":50400000000000,"ClientID":"","ClientSecret":"","ClientSecretFile":"","Provider":"","ProviderURL":"","Scopes":null,"RequestParams":null,"AuthorizeURLString":"","AuthorizeURLStrings":null,"AuthorizeInternalURLString":"","OverrideCertificateName":"","CA":"","CAFile":"","DeriveInternalDomainCert":null,"SigningKey":"...
...=","SigningKeyFile":"","HeadersEnv":"","SetResponseHeaders":null,"JWTClaimsHeaders":null,"DefaultUpstreamTimeout":30000000000,"MetricsAddr":"10.42.3.254:9090","MetricsBasicAuth":"","MetricsCertificate":"","MetricsCertificateKey":"","MetricsCertificateFile":"","MetricsCertificateKeyFile":"","M
etricsClientCA":"","MetricsClientCAFile":"","TracingProvider":"","TracingSampleRate":0.0001,"TracingDatadogAddress":"","TracingJaegerCollectorEndpoint":"","TracingJaegerAgentEndpoint":"","ZipkinEndpoint":"","GRPCAddr":":443","GRPCInsecure":null,"GRPCClientTimeout":10000000000,"GRPCClientDNSRoundRobin":true,"DataBrokerURLString":"","DataBrokerURLStrings":null,"DataBrokerInternalURLString":"","DataBrokerStorageType":"memory","DataBrokerStorageConnectionString":"","DataBrokerS
torageCertFile":"","DataBrokerStorageCertKeyFile":"","DataBrokerStorageCAFile":"","DataBrokerStorageCertSkipVerify":false,"ClientCA":"","ClientCAFile":"","ClientCRL":"","ClientCRLFile":"","GoogleCloudServerlessAuthenticationServiceAccount":"","Enable":false,"Email":"","UseStaging":false,"EABKeyID":"","EABMACKey":"","MustStaple":false,"Folder":"/.local/share/pomerium","TrustedCA":"","TrustedCAFile":"","EnvoyAdminAccessLogPath":"/dev/null","EnvoyAdminProfilePath":"/dev/null",
"EnvoyAdminAddress":"","EnvoyBindConfigSourceAddress":"","EnvoyBindConfigFreebind":null,"programmatic_redirect_domain_whitelist":["localhost"],"CodecType":"","AuditKey":null,"BrandingOptions":null},"AutoCertificates":null,"EnvoyVersion":"","DerivedCertificates":null,"DerivedCAPEM":null,"GRPCPort":"37583","HTTPPort":"41593","OutboundPort":"39891","MetricsPort":"39949","DebugPort":"41593","ACMETLSALPNPort":"35825","MetricsScrapeEndpoints":[{"Name":"bootstrap","URL":{"Scheme":
"http","Opaque":"","User":null,"Host":"localhost:45845","Path":"/metrics","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Labels":{"service":"bootstrap-controller"}},{"Name":"ingress","URL":{"Scheme":"http","Opaque":"","User":null,"Host":"localhost:40757","Path":"/metrics","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Labels":{"service":"ingress-controller"}}]}}
2023-06-21T13:42:21.328101567-07:00 stdout F {"level":"info","address":"127.0.0.1:39891","time":"2023-06-21T20:42:21Z","message":"grpc: dialing"}
2023-06-21T13:42:21.329132159-07:00 stdout F {"level":"info","outbound_port":"39891","databroker_urls":["http://127.0.0.1:5443"],"time":"2023-06-21T20:42:21Z","message":"config: starting databroker config source syncer"}
2023-06-21T13:42:21.329351842-07:00 stdout F {"level":"info","service":"all","config":"databroker","checksum":"4d92e27408aa97a","time":"2023-06-21T20:42:21Z","message":"config: updated config"}
2023-06-21T13:42:21.329650682-07:00 stdout F {"level":"info","time":"2023-06-21T20:42:21Z","logger":"maintenance","msg":"started background certificate maintenance","service":"autocert","cache":"0xc00023a3f0"}
2023-06-21T13:42:21.329685426-07:00 stdout F {"level":"info","service":"autocert-manager","addr":":8080","time":"2023-06-21T20:42:21Z","message":"starting http redirect server"}
2023-06-21T13:42:21.346850281-07:00 stderr F panic: runtime error: invalid memory address or nil pointer dereference
2023-06-21T13:42:21.346896979-07:00 stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x27e8192]
2023-06-21T13:42:21.346913402-07:00 stderr F 
2023-06-21T13:42:21.346928681-07:00 stderr F goroutine 102 [running]:
2023-06-21T13:42:21.347323796-07:00 stderr F github.com/pomerium/pomerium/internal/controlplane.NewServer(0xc00099a700, 0xc00067c480, 0xc000a1c760)
2023-06-21T13:42:21.347381588-07:00 stderr F    /go/pkg/mod/github.com/pomerium/pomerium@v0.22.2/internal/controlplane/server.go:127 +0x1392
2023-06-21T13:42:21.347784321-07:00 stderr F github.com/pomerium/pomerium/pkg/cmd/pomerium.Run({0x38ae758, 0xc0004601e0}, {0x3895fc8, 0xc000857f20})
2023-06-21T13:42:21.347821587-07:00 stderr F    /go/pkg/mod/github.com/pomerium/pomerium@v0.22.2/pkg/cmd/pomerium/pomerium.go:68 +0x3f4
2023-06-21T13:42:21.347838504-07:00 stderr F github.com/pomerium/ingress-controller/pomerium/ctrl.(*Runner).Run(0xc000462300, {0x38ae758, 0xc0004601e0})
2023-06-21T13:42:21.347925483-07:00 stderr F    /workspace/pomerium/ctrl/run.go:79 +0x16c
2023-06-21T13:42:21.347956343-07:00 stderr F github.com/pomerium/ingress-controller/cmd.(*allCmdParam).run.func1()
2023-06-21T13:42:21.347963723-07:00 stderr F    /workspace/cmd/all_in_one.go:188 +0x25
2023-06-21T13:42:21.347989226-07:00 stderr F golang.org/x/sync/errgroup.(*Group).Go.func1()
2023-06-21T13:42:21.347995271-07:00 stderr F    /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64
2023-06-21T13:42:21.348000568-07:00 stderr F created by golang.org/x/sync/errgroup.(*Group).Go
2023-06-21T13:42:21.348006309-07:00 stderr F    /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5
wasaga commented 1 year ago

I can't untangle the startup sequence. This only worked with lots of thrashing around, before.

I'm not sure I understand what do you mean.

Please join our support slack maybe we could debug your issue interactively.

drewp commented 1 year ago

Here's a specific code issue related to the log errors I'm getting. Please look at https://github.com/pomerium/pomerium/blob/ec495bb682bee1479b0766a6840b59da11c7b2dc/internal/urlutil/signed.go

It's surprising that "" and "0" are handled so differently. If that's correct, it deserves a comment.

If no one is making urls with issued=0, then there's unused code. Meanwhile, I'm definitely getting issued="" which spews ErrNumericDateMalformed in my logs. At minimum, the error should be something like "no 'issued' param found".

drewp commented 1 year ago

(edit: 2nd try)

I bisected IC and I'm fairly sure I have the breaking commit now.

To test more reliably, I made this demo page which loads, waits, then does a fetch() (of itself) after increasing delays. That link is a public copy, but I'm serving mine from a policy-protected url.

When I run the IC image at docker.io/pomerium/ingress-controller:sha-7f1a148, everything seems ok, and my page can sit for 600 sec then make a fetch(). Pomerium remembers my cookie/session/etc and permits the fetch request.

When I run the IC image at docker.io/pomerium/ingress-controller:sha-9e3641b, I can load the page, do a fetch 6 sec later, but when I wait 300 sec, my fetch is blocked.

View of 9e3641b, which seems to be the breaking commit

drewp commented 10 months ago

Upgraded to docker.io/pomerium/ingress-controller:sha-f549329 (2023-11-16) and haven't been able to repro so far.