ory / kratos

The most scalable and customizable identity server on the market. Replace your Homegrown, Auth0, Okta, Firebase with better UX and DX. Has all the tablestakes: Passkeys, Social Sign In, Multi-Factor Auth, SMS, SAML, TOTP, and more. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.
https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=kratos
Apache License 2.0
11.14k stars 955 forks source link

Document Argon2 configuration best practices #572

Closed rauanmayemir closed 3 years ago

rauanmayemir commented 4 years ago

Describe the bug

I've been trying to set up kratos v0.4.4 with selfservice-ui-node locally in minikube. While slow, I managed to succeed with registering and verifying my identity.

However, trying to login simply hangs the service. Occasionally it makes the whole minikube unresponsive, so I have to completely shut it down and restart.

I caught the liveness status updates that gives a clue of what happened:

  Warning  Unhealthy       114s (x2 over 116s)  kubelet, minikube  Readiness probe failed: Get http://172.17.0.11:15021/healthz/ready: dial tcp 172.17.0.11:15021: connect: connection refused
  Warning  Unhealthy       113s                 kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy       19s                  kubelet, minikube  Readiness probe failed: Get http://172.17.0.11:15020/app-health/kratos/readyz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       11s (x2 over 21s)    kubelet, minikube  Liveness probe failed: Get http://172.17.0.11:15020/app-health/kratos/livez: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       10s                  kubelet, minikube  Readiness probe failed: HTTP probe failed with statuscode: 500

It seems like kratos pod choked on the login request and even stopped responding to health requests, so k8s just restarted the pod.

Here's what was in the kratos logs from the time when I opened the selfservice at https://auth.ips.test (this time k8s weren't able to restart the pod and simply hang):

time=2020-07-13T11:48:32Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=b943ee28-e868-441c-85c6-f0d61d048d9c
time=2020-07-13T11:48:32Z level=info msg=No valid session cookie found. audience=audit error=map[debug: message:request does not have a valid authentication session reason:No active session was found in this request. status:Unauthorized status_code:401] http_request=map[headers:map[accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:max-age=0 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b943ee28-e868-441c-85c6-f0d61d048d9c] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] service_name=ORY Kratos service_version=v0.4.3-alpha.1
time=2020-07-13T11:48:32Z level=error msg=An error occurred while handling a request audience=application error=map[debug: message:The request could not be authorized reason:No valid session cookie found. status:Unauthorized status_code:401] http_request=map[headers:map[accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:max-age=0 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b943ee28-e868-441c-85c6-f0d61d048d9c] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] http_response=map[status_code:401] service_name=kratos service_version=
time=2020-07-13T11:48:32Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=b943ee28-e868-441c-85c6-f0d61d048d9c status=401 text_status=Unauthorized took=24.236895ms
time=2020-07-13T11:48:32Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=682f0670-744c-489f-aae5-fadb8cf61bce
time=2020-07-13T11:48:32Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=682f0670-744c-489f-aae5-fadb8cf61bce status=302 text_status=Found took=20.368698ms
time=2020-07-13T11:48:32Z level=info msg=started handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=b3723016-7c19-4698-90fc-57ae736702df
time=2020-07-13T11:48:32Z level=info msg=completed handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=b3723016-7c19-4698-90fc-57ae736702df status=200 text_status=OK took=5.518439ms
time=2020-07-13T11:48:33Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=3398ca0d-2c26-447b-a32a-c7998ae65d9a
time=2020-07-13T11:48:33Z level=info msg=No valid session cookie found. audience=audit error=map[debug: message:request does not have a valid authentication session reason:No active session was found in this request. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 referer:https://auth.ips.test/auth/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:3398ca0d-2c26-447b-a32a-c7998ae65d9a] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] service_name=ORY Kratos service_version=v0.4.3-alpha.1
time=2020-07-13T11:48:33Z level=error msg=An error occurred while handling a request audience=application error=map[debug: message:The request could not be authorized reason:No valid session cookie found. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 referer:https://auth.ips.test/auth/login?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:3398ca0d-2c26-447b-a32a-c7998ae65d9a] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55308 scheme:http] http_response=map[status_code:401] service_name=kratos service_version=
time=2020-07-13T11:48:33Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/sessions/whoami request_id=3398ca0d-2c26-447b-a32a-c7998ae65d9a status=401 text_status=Unauthorized took=975.21µs
time=2020-07-13T11:48:33Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=6a67651c-1383-40b5-8a4f-c3883524b028
time=2020-07-13T11:48:33Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login request_id=6a67651c-1383-40b5-8a4f-c3883524b028 status=302 text_status=Found took=8.778967ms
time=2020-07-13T11:48:33Z level=info msg=started handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=7d75d569-531d-46c2-87b8-639a2119927d request_id=19186ea2-0f4d-41c6-8246-af087193a004
time=2020-07-13T11:48:33Z level=info msg=completed handling request method=GET name=admin#http://ips-auth-kratos-admin.default.svc.cluster.local/ remote=127.0.0.1:55788 request=/self-service/browser/flows/requests/login?request=7d75d569-531d-46c2-87b8-639a2119927d request_id=19186ea2-0f4d-41c6-8246-af087193a004 status=200 text_status=OK took=2.801678ms
time=2020-07-13T11:48:51Z level=info msg=started handling request method=POST name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=f19362c3-1ed9-4dda-a8af-2e3fe068f3a0
time=2020-07-13T11:49:15Z level=info msg=started handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55576 request=/sessions/whoami request_id=b8751665-bd9b-4ea4-a238-949a3e4d8ac8
time=2020-07-13T11:49:15Z level=info msg=No valid session cookie found. audience=audit error=map[debug: message:request does not have a valid authentication session reason:No active session was found in this request. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:no-cache referer:https://auth.ips.test/.ory/kratos/public/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b8751665-bd9b-4ea4-a238-949a3e4d8ac8] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55576 scheme:http] service_name=ORY Kratos service_version=v0.4.3-alpha.1
time=2020-07-13T11:49:15Z level=error msg=An error occurred while handling a request audience=application error=map[debug: message:The request could not be authorized reason:No valid session cookie found. status:Unauthorized status_code:401] http_request=map[headers:map[accept:image/webp,image/apng,image/*,*/*;q=0.8 accept-encoding:gzip, deflate, br accept-language:en-US,en;q=0.9,kk-KZ;q=0.8,kk;q=0.7,ru-KZ;q=0.6,ru;q=0.5,fr;q=0.4 cache-control:no-cache referer:https://auth.ips.test/.ory/kratos/public/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 x-forwarded-for:172.17.0.1 x-forwarded-proto:https x-request-id:b8751665-bd9b-4ea4-a238-949a3e4d8ac8] host:auth.ips.test method:GET path:/sessions/whoami query:<nil> remote:127.0.0.1:55576 scheme:http] http_response=map[status_code:401] service_name=kratos service_version=
time=2020-07-13T11:49:15Z level=info msg=completed handling request method=GET name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55576 request=/sessions/whoami request_id=b8751665-bd9b-4ea4-a238-949a3e4d8ac8 status=401 text_status=Unauthorized took=2.029955ms
time=2020-07-13T11:49:29Z level=info msg=completed handling request method=POST name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55308 request=/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=f19362c3-1ed9-4dda-a8af-2e3fe068f3a0 status=302 text_status=Found took=37.664910136s
time=2020-07-13T11:49:29Z level=info msg=started handling request method=POST name=public#https://auth.ips.test/.ory/kratos/public remote=127.0.0.1:55848 request=/self-service/browser/flows/login/strategies/password?request=4b9f0ca6-0fbb-4e05-a8ea-68e54a55e407 request_id=f19362c3-1ed9-4dda-a8af-2e3fe068f3a0

Reproducing the bug

Here's my config:

helm-kratos-values.yaml


replicaCount: 1

image: repository: oryd/kratos tag: v0.4.3-alpha.1-sqlite pullPolicy: IfNotPresent

service: admin: enabled: true type: ClusterIP port: 80 annotations: {} public: enabled: true type: ClusterIP port: 80 annotations: {}

kratos: development: true autoMigrate: true

config: dsn: "postgres://connection" secrets: cookie:

I manually updated helm-generated configmap to include /etc/config/identity.traits.schema.json, got the default one from the latest tagged kratos release.

auth-selfservice-ui.yaml


apiVersion: v1
kind: Service
metadata:
name: ips-auth-selfservice
labels:
app: auth
spec:
ports:
- port: 80
targetPort: 3000
name: http
selector:
app: auth
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ips-auth-selfservice
spec:
selector:
matchLabels:
app: auth
replicas: 1
template:
metadata:
labels:
app: auth
spec:
containers:
- name: kratos
image: oryd/kratos-selfservice-ui-node:v0.4.4-alpha.1
env:
- name: SECURITY_MODE
value: "cookie"
- name: BASE_URL
value: "https://auth.ips.test"
- name: KRATOS_PUBLIC_URL
value: "http://ips-auth-kratos-public.default.svc.cluster.local"
- name: KRATOS_ADMIN_URL
value: "http://ips-auth-kratos-admin.default.svc.cluster.local"
ports:
- containerPort: 3000
---

Environment

aeneasr commented 4 years ago

You're probably allocating too many resources for Argon2. Running Istio in Minikube is already a performance sink and adding Argon2 to the mix could make your VM unresponsive.

rauanmayemir commented 4 years ago

I did not realize Argon2 is that expensive. 😄 I'll try to adjust it.

aeneasr commented 4 years ago

Depends on the config but the defaults are pretty high: https://github.com/ory/kratos/blob/master/driver/configuration/provider_viper.go#L111-L115 (4GB RAM with 4 iterations with 2*CPU parallelism)

rauanmayemir commented 4 years ago

I've tried to adjust the config and it's still hanging:

hashers:
  argon2:
    memory: 524288
    iterations: 2
    parallelism: 1
    salt_length: 16
    key_length: 16

This is my minikube config:

{
    "cpus": 4,
    "dashboard": true,
    "kubernetes-version": "1.18.3",
    "memory": 8192,
    "vm-driver": "virtualbox"
}

I'm just shocked this could be that resource-heavy.

aeneasr commented 4 years ago

Try iterations: 1. Also make sure that the spike is really Kratos. Also keep in mind that you're running Istio in VirtualBox/Minikube and have not enabled the minimum requirements. Running applications that consume non-significant memory/cpu (such as password hashing) on Istio on a below-minimum resource machine is going to cause issues.

Start minikube with 16384 MB of memory and 4 CPUs. This example uses Kubernetes version 1.17.5. You can change the version to any Kubernetes version supported by Istio by altering the --kubernetes-version value:

https://istio.io/latest/docs/setup/platform-setup/minikube/

aeneasr commented 4 years ago

By the way you're still using 512MB RAM, but on a machine that is already over-utilized by Istio. In our quickstart, we have dialed down everything quite a lot to make sure that it runs everywhere:

https://github.com/ory/kratos/blob/master/contrib/quickstart/kratos/email-password/.kratos.yml#L58-L64

I wouldn't recommend doing that in prod though.

rauanmayemir commented 4 years ago

I changed the config to:

memory: 65536
iterations: 1
parallelism: 1
salt_length: 16
key_length: 16

But it's still crashing. Will try it later on the actual k8s cluster. I acknowledge that istio is super resource-hungry.

rauanmayemir commented 4 years ago

This works fine on a beefier hardware.

aeneasr commented 4 years ago

Ok, can I close this then or do you need further clarification? :)

alsuren commented 4 years ago

Sorry for reviving an old thread, but it seems the most appropriate one.

https://tools.ietf.org/html/draft-irtf-cfrg-argon2-10#section-4 ("Parameter Choice") says:

We suggest the following settings:

   o  Backend server authentication, that takes 0.5 seconds on a 2 GHz
      CPU using 4 cores -- Argon2id with 8 lanes and 4 GiB of RAM.

   o  Key derivation for hard-drive encryption, that takes 3 seconds on
      a 2 GHz CPU using 2 cores - Argon2id with 4 lanes and 6 GiB of
      RAM.

   o  Frontend server authentication, that takes 0.5 seconds on a 2 GHz
      CPU using 2 cores - Argon2id with 4 lanes and 1 GiB of RAM.

I would say that Kratos' is mostly used for Frontend server authentication but its default parameters are tuned for Backend server authentication. Would you accept a patch which tunes the parameters appropriately, or should I just tune them for my cluster and post my parameters on this thread?

aeneasr commented 4 years ago

All good, we should definitely document that and maybe also change the defaults used in the demo. If you're up for a PR @alsuren please go ahead :)

aeneasr commented 4 years ago

@tricky42 https://github.com/ory/kratos/issues/572#issuecomment-674804449 is relevant to you probably

aikoven commented 4 years ago

I have a similar problem where the Kratos process becomes unresponsive. My Argon2 config is:

argon2:
  parallelism: 1
  memory: 65536
  iterations: 1
  salt_length: 16
  key_length: 16

Still sometimes (not every time) when I try to perform login, Kratos process starts consuming 4+ cores of CPU and 4GB+ memory, then login request dies by timeout.

Example log:

time=2020-09-03T10:39:59Z level=info msg=completed handling request method=POST name=public#https://my-domain/.ory/kratos/public remote=172.30.134.6 request=/self-service/browser/flows/login/strategies/password?request=f1959ca9-8288-4936-a4dd-bd18ebaca97c request_id=5145e17fbf79383a92b437d852ccf889 status=302 text_status=Found took=52.018030695s

I'm running Kratos in Kubernetes, and while that request lasts, the pod becomes unready.

aeneasr commented 4 years ago

Make sure to have allocated enough CPU and memory limits!

aikoven commented 4 years ago

Thanks, I'll try it.

I was concerned by kubectl top pod output that showed 4+ cpu core usage. But that may be due to overall node cpu saturation probably.