Improve Argon2 helpers and defaults and documentation

aeneasr commented 3 years ago

Is your feature request related to a problem? Please describe.

[ ] The argon2 choose should start with much less memory. In Stackoverflow and others I regularly see 128MB or 64MB as the default with 3+ iterations. It's also a good idea to check out libsodium argon2 defaults or other software which uses argon2 and see where their parameters are at.
[ ] CLI argon2 helper should
- [ ] start with less memory (I remember that we can't get the system memory, correct? otherwise half of system memory would be a good starting point)
- [ ] should have a --max-concurrent argument which defines the maximum amount of concurrent password hashing processes. you could use go routines to dispatch the amount of concurrent subroutines. since this is the maximum, there should also be something which checks for the low and average or median amount of concurrent users. the deviation of the target time should not be too high for low, mid, high concurrent users. in general, I think the system memory / 2 (to leave room for other processes) / amount of concurrent users should be the maximum memory allowed to be requested.
- [ ] is it possible to detect OOM errors in golang? if so, argon2 should not crash the process when we run out of memory. not sure if that is possible though but I saw that libsodium can do that
[ ] Update the blog post with the new learnings

[ ] config.schema.json needs a description for the config fields and sane defaults:


"hashers": {
  "title": "Hashing Algorithm Configuration",
  "type": "object",
  "properties": {
    "argon2": {
      "title": "Configuration for the Argon2id hasher.",
      "type": "object",
      "properties": {
        "memory": {
          "title": "Memory consumption in bytes",
          "description": "Sets the memory (RAM) consumption in bytes. Minimum value is 16MB (16384 / 1024 = 16) and default value is 64MB. For more information check out https://www.ory.sh/kratos/docs/debug/performance-out-of-memory-password-hashing-argon2",
          "type": "integer",
          "minimum": 16384,
          "default": 65536
        },
        "iterations": {
          "type": "integer",
          "minimum": 1,
          "default": 3
        },
        "parallelism": {
          "type": "integer",
          "minimum": 1
        },
        "salt_length": {
          "type": "integer",
          "minimum": 16,
          "default": 32
        },
        "key_length": {
          "type": "integer",
          "minimum": 16,
          "default": 32
        }
      },
      "additionalProperties": false
    }
  },
  "additionalProperties": false
},

[ ] https://github.com/ory/kratos/blob/master/driver/config/provider_viper.go#L87 these defaults are insane (4GB RAM per default? no wonder all the k8s deployments crash!)
[ ] We need to add some more details to https://github.com/ory/kratos/blob/master/docs/docs/debug/performance-out-of-memory-password-hashing-argon2.md or write a docs guide for choosing the argon2 arguments and have the debug document for possible error debugging (a bit duplicate but might help people find stuff).
Here's what I wrote:

Manual Configuration

You may also choose the Argon2 parameters manually.

:::note

Please keep in mind that your host machine is probably doing more than just computing Argon2 hashes, so choose these parameters wisely. It is also important that ORY Kratos will probably compute several hashes in parallel, depending on how many concurrent logins or registrations you have.

:::

To configure Argon2, edit the ORY Kratos configuration file:

# $ kratos -c path/to/my/kratos/config.yml serve
hashers:
  argon2:
    # Memory consumption per hashing process denoted in bytes.
    # Here: 131072 / 1024 = 128MB
    memory: 131072

    # We recommend not choosing these parameters manually but using
    # the Argon2 calibration command.
    parallelism: 1
    iterations: 3



**Describe the solution you'd like**

A clear and concise description of what you want to happen.

**Describe alternatives you've considered**

A clear and concise description of any alternative solutions or features you've
considered.

**Additional context**

Add any other context or screenshots about the feature request here.

aeneasr commented 3 years ago

Additionally:

[ ] https://github.com/ory/kratos/blob/master/docs/docs/concepts/security.mdx#expensive-hashing-with-argon2 should explain what the values mean (e.g. 128MB memory...) and link to the blog post and/or to the guide explaining how to configure argon2

zepatrik commented 3 years ago

Recovering seems not to be an option: https://stackoverflow.com/questions/30577308/golang-cannot-recover-from-out-of-memory-crash I guess the argon2 hasher should then have a queue with a lenght of max-concurrent? That is the only way I see we can prevent OOM.

zepatrik commented 3 years ago

Sequential: Screenshot from 2021-01-04 13-23-37 Parallel: So streamlining argon2 will definitely prevent OOM but might result in some wait time for individual requests. The solution will be to allow a certain amount of concurrent operations while streamlining everything that is above that. The CLI helper has the purpose of finding the configuration values. Therefore it should consider the number of operations that Kratos should be able to handle concurrently. This highly depends on the application and usage patterns. We can therefore not make assumptions about the distribution or normal/max/average load.

zepatrik commented 3 years ago

Final findings

In a real world deployment it is a statistical problem how many requests should be supported concurrently. It depends on the rate of requests (#/minute) and the execution time of a single request. The execution time depends linearly on memory, iterations, and concurrent executions: plot_concurrent_over_memory_med (XC means X concurrent, while 1C is in the plot twice; med means median)

With concurrency comes a higher deviation (probably because of scheduling/resource allocation). Therefore the parameters have to be chosen in respect to the statistically relevant events. Compare the following plots. Both use the same parameters for hashing, designed for a compute time of ~0.1s.

Screenshot from 2021-01-05 12-04-05 For 256 requests/min the standard deviation is pretty high, resulting in very unpredictable latency. Also the maximum compute time of over 1.5s is way over the desired time of 0.1s resulting in bad UX.

Screenshot from 2021-01-05 12-04-11 In the case of 32 requests per minute the standard deviation is very low and both min and max values are perfectly in the range of the desired time.

Here are some more stats without visual plots. Note that these where taken on a machine running other applications as well.

32 req/min (= # of samples)

TOTAL SAMPLE TIME       55.675696486s   
MEDIAN                  181.180305ms    
STANDARD DEVIATION      51.922335ms     
MIN                     120.312156ms    
MAX                     342.119787ms    
MEMORY USED             1.10GB

64 req/min (= # of samples)

TOTAL SAMPLE TIME       1m0.107102303s  
MEDIAN                  174.04383ms     
STANDARD DEVIATION      61.851463ms     
MIN                     102.905238ms    
MAX                     373.156979ms    
MEMORY USED             1.10GB

96 req/min (= # of samples)

TOTAL SAMPLE TIME       59.443645821s   
MEDIAN                  195.619608ms    
STANDARD DEVIATION      85.098722ms     
MIN                     103.864426ms    
MAX                     566.401984ms    
MEMORY USED             1.88GB

128 req/min (= # of samples)

TOTAL SAMPLE TIME       1m0.087153479s  
MEDIAN                  215.543949ms    
STANDARD DEVIATION      83.693541ms     
MIN                     103.639096ms    
MAX                     518.921044ms    
MEMORY USED             2.13GB

256 req/min (= # of samples)

TOTAL SAMPLE TIME       59.857562916s   
MEDIAN                  322.990349ms    
STANDARD DEVIATION      264.79309ms     
MIN                     102.390406ms    
MAX                     1.653100701s    
MEMORY USED             4.20GB

So the goal has to be to find values that result in a low standard deviation while meeting the requirements of acceptable min/max times and required memory for an expected rate of login requests. Everything above that expected rate should be queued to prevent OOM as much as possible.

zepatrik commented 3 years ago

Here are some example data that try to find the best values for ~0.1s with 64req/min for my machine. This is what I want users to do and tune to their requirements.

Too much memory than I would like to dedicate:

TOTAL                   59.628672241s   
MEDIAN                  412.544673ms    
STANDARD DEVIATION      210.574957ms    
MIN                     225.377607ms    
MAX                     1.109833309s    
MEMORY USED             4.20GB

(1 iteration; 512MB memory)

Too much CPU usage:

TOTAL                   1m0.265162692s  
MEDIAN                  889.290351ms    
STANDARD DEVIATION      481.599671ms    
MIN                     462.723141ms    
MAX                     2.537232562s    
MEMORY USED             1.62GB

(10 iterations; 128MB memory)

A seemingly good configuration:

TOTAL                   59.285180675s   
MEDIAN                  173.700229ms    
STANDARD DEVIATION      57.711584ms     
MIN                     101.515008ms    
MAX                     366.054783ms    
MEMORY USED             731.98MB

(2 iterations; 128MB memory)

aeneasr commented 3 years ago

Sweet!

ory / kratos