ory / kratos

The most scalable and customizable identity server on the market. Replace your Homegrown, Auth0, Okta, Firebase with better UX and DX. Has all the tablestakes: Passkeys, Social Sign In, Multi-Factor Auth, SMS, SAML, TOTP, and more. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.
https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=kratos
Apache License 2.0
11.14k stars 955 forks source link

Improve Argon2 helpers and defaults and documentation #955

Closed aeneasr closed 3 years ago

aeneasr commented 3 years ago

Is your feature request related to a problem? Please describe.

Manual Configuration

You may also choose the Argon2 parameters manually.

:::note

Please keep in mind that your host machine is probably doing more than just computing Argon2 hashes, so choose these parameters wisely. It is also important that ORY Kratos will probably compute several hashes in parallel, depending on how many concurrent logins or registrations you have.

:::

To configure Argon2, edit the ORY Kratos configuration file:

# $ kratos -c path/to/my/kratos/config.yml serve
hashers:
  argon2:
    # Memory consumption per hashing process denoted in bytes.
    # Here: 131072 / 1024 = 128MB
    memory: 131072

    # We recommend not choosing these parameters manually but using
    # the Argon2 calibration command.
    parallelism: 1
    iterations: 3


**Describe the solution you'd like**

A clear and concise description of what you want to happen.

**Describe alternatives you've considered**

A clear and concise description of any alternative solutions or features you've
considered.

**Additional context**

Add any other context or screenshots about the feature request here.
aeneasr commented 3 years ago

Additionally:

zepatrik commented 3 years ago

Recovering seems not to be an option: https://stackoverflow.com/questions/30577308/golang-cannot-recover-from-out-of-memory-crash I guess the argon2 hasher should then have a queue with a lenght of max-concurrent? That is the only way I see we can prevent OOM.

zepatrik commented 3 years ago

Sequential: Screenshot from 2021-01-04 13-23-37 Parallel: Screenshot from 2021-01-04 13-32-21 So streamlining argon2 will definitely prevent OOM but might result in some wait time for individual requests. The solution will be to allow a certain amount of concurrent operations while streamlining everything that is above that. The CLI helper has the purpose of finding the configuration values. Therefore it should consider the number of operations that Kratos should be able to handle concurrently. This highly depends on the application and usage patterns. We can therefore not make assumptions about the distribution or normal/max/average load.

zepatrik commented 3 years ago

Final findings

In a real world deployment it is a statistical problem how many requests should be supported concurrently. It depends on the rate of requests (#/minute) and the execution time of a single request. The execution time depends linearly on memory, iterations, and concurrent executions: plot_concurrent_over_memory_med (XC means X concurrent, while 1C is in the plot twice; med means median)

With concurrency comes a higher deviation (probably because of scheduling/resource allocation). Therefore the parameters have to be chosen in respect to the statistically relevant events. Compare the following plots. Both use the same parameters for hashing, designed for a compute time of ~0.1s.

Screenshot from 2021-01-05 12-04-05 For 256 requests/min the standard deviation is pretty high, resulting in very unpredictable latency. Also the maximum compute time of over 1.5s is way over the desired time of 0.1s resulting in bad UX.

Screenshot from 2021-01-05 12-04-11 In the case of 32 requests per minute the standard deviation is very low and both min and max values are perfectly in the range of the desired time.

Here are some more stats without visual plots. Note that these where taken on a machine running other applications as well.

32 req/min (= # of samples)

TOTAL SAMPLE TIME       55.675696486s   
MEDIAN                  181.180305ms    
STANDARD DEVIATION      51.922335ms     
MIN                     120.312156ms    
MAX                     342.119787ms    
MEMORY USED             1.10GB

64 req/min (= # of samples)

TOTAL SAMPLE TIME       1m0.107102303s  
MEDIAN                  174.04383ms     
STANDARD DEVIATION      61.851463ms     
MIN                     102.905238ms    
MAX                     373.156979ms    
MEMORY USED             1.10GB 

96 req/min (= # of samples)

TOTAL SAMPLE TIME       59.443645821s   
MEDIAN                  195.619608ms    
STANDARD DEVIATION      85.098722ms     
MIN                     103.864426ms    
MAX                     566.401984ms    
MEMORY USED             1.88GB 

128 req/min (= # of samples)

TOTAL SAMPLE TIME       1m0.087153479s  
MEDIAN                  215.543949ms    
STANDARD DEVIATION      83.693541ms     
MIN                     103.639096ms    
MAX                     518.921044ms    
MEMORY USED             2.13GB

256 req/min (= # of samples)

TOTAL SAMPLE TIME       59.857562916s   
MEDIAN                  322.990349ms    
STANDARD DEVIATION      264.79309ms     
MIN                     102.390406ms    
MAX                     1.653100701s    
MEMORY USED             4.20GB

So the goal has to be to find values that result in a low standard deviation while meeting the requirements of acceptable min/max times and required memory for an expected rate of login requests. Everything above that expected rate should be queued to prevent OOM as much as possible.

zepatrik commented 3 years ago

Here are some example data that try to find the best values for ~0.1s with 64req/min for my machine. This is what I want users to do and tune to their requirements.

Too much memory than I would like to dedicate:

TOTAL                   59.628672241s   
MEDIAN                  412.544673ms    
STANDARD DEVIATION      210.574957ms    
MIN                     225.377607ms    
MAX                     1.109833309s    
MEMORY USED             4.20GB 

(1 iteration; 512MB memory)

Too much CPU usage:

TOTAL                   1m0.265162692s  
MEDIAN                  889.290351ms    
STANDARD DEVIATION      481.599671ms    
MIN                     462.723141ms    
MAX                     2.537232562s    
MEMORY USED             1.62GB

(10 iterations; 128MB memory)

A seemingly good configuration:

TOTAL                   59.285180675s   
MEDIAN                  173.700229ms    
STANDARD DEVIATION      57.711584ms     
MIN                     101.515008ms    
MAX                     366.054783ms    
MEMORY USED             731.98MB

(2 iterations; 128MB memory)

aeneasr commented 3 years ago

Sweet!