influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.9k stars 3.55k forks source link

Create Bearer Token All-or-Nothing Authentication #24649

Open hiltontj opened 8 months ago

hiltontj commented 8 months ago

@mgattozzi - as per PR comment, here are some ideas I had re: the token CLI. I didn't want to block the PR with them, so we could pursue them with follow on issues if we like.

mgattozzi commented 7 months ago

Just want to tag on @jdstrand here to see what he thinks about these ideas @hiltontj

jdstrand commented 7 months ago

influxdb3 create token

I suggest influxdb3 <resource> <action> (in this case, influxdb3 token create) as a design choice. It has good usability, is consistent with other products (mostly; some older programs mix things up a bit) and should make parsing command line arguments easier. Since you used influxdb3 create token in the examples, I'll continue to use it below

Have the hashed version saved to a file, using a file_output parameter on the token sub-command. The serve command could then have a similar parameter to specify the bearer token file. ... It would be nice to generate the token for them, and have a consistent place to store it, but that could be OS dependent and therefore beyond the scope of this PR. ...

The first option of saving to a file specified via command line is fine, though please save it with 640 (rw-r-----) or 600 (rw-------) permissions. I'm guessing the assumption here is that the user would then take this file and put it somewhere for influxdb3 serve to use and the user would tell influxdb3 serve where it is. This seems reasonable for an MVP (ie, let the user decide where to put stuff and with what permissions/ownership).

That said, I do think that for the MVP it's probably reasonable to look into option 2 at least a bit to set some defaults when invoking influxdb3, defaulting the prefix to be ~/.influxdb3 (with 750 (rwxr-x---) permissions (700 also acceptable)) such that influxdb3 create token outputs to ~/.influxdb3/tokens.db and data files, etc are written somewhere in ~/.influxdb3/.... This would work well for a developer experience/kicking the tires. OSS 2.x does something similar and it has a lot of configurability on where files are placed.

Beyond the MVP, here are some additional thoughts to consider...

Taking a step back, I think we need to think about what the CLI management experience is expected to be to know exactly what we want. We'd want to consider the above developer experience, container installs and bare metal system installs (eg, binary is /usr/bin/influxdb3, config dir is /etc/influxdb3, data dir is /var/lib/influxdb3, etc where there might be a systemd unit that runs influxdb3 as influxdb3:influxdb3 and all the influxdb3-owned directories are 750 (rwxr-x---) and files 640 (rw-r-----)). OSS 2.x uses a config library that allows parsing command line arguments (eg, useful for development), configuration files (eg, useful for boot scripts in bare metal system installs) and environment variables (eg, useful for containers) in a unified way (this is probably desirable outside of where the tokens database lives).

Considering that, a workable experience would be to make the location of the config dir and data dir configurable and then having influxdb3 create token consult that configuration to update/manage the tokens database in place. Eg:

This experience should be achievable relatively quickly after the MVP. It has the downside that the influxdb3 create token command is tightly coupled to the server it is managing (since it is manipulating the tokens db directly and the server needs to be able to understand the format).

An alternative would be to have influxdb3 create token interact with the server (eg, over a socket) so the server can manage the file. This has some usability improvements and can reduce coupling regarding the disk format of the tokens database (assuming the API doesn't change), but now you have to deal with access to the socket (eg, if UNIX domain socket, the socket has proper perms/ownership or if REST, you supply an Authorization header). OSS 2.x does this, but there is a chicken and egg problem since you now need to have an 'initialize' functionality to seed the db with a special operator token (eg, 2.x influx setup). This also forces you to think about permissions more deeply since this authz 'setup' token is for management whereas the ones that influxdb3 create token creates are for database (ie, read/write to the database is a different permission than creating a token).

Lastly, my understanding from EKO was that influxdb3 could(/would?) store these hashes (and any corresponding permissions maps they reference) in a file in S3. So long as the S3 buckets and objects are similarly protected with secure permissions (eg, 750 (rwxr-x---) directory permissions (700 is also acceptable) and 640 (rw-r-----) file permissions (600 is also acceptable), security is fine. However, storing in S3 suggests influxdb3 create token talks to the server and the server puts the hashes/permissions maps in its backing storage (since, presumably, the S3 implementation is separate from influxdb3). While this S3 functionality might be down the line, if we know we're going there, it is worth thinking about the now since it could affect the decision on how tokens are created/stored.

jdstrand commented 7 months ago

... defaulting the prefix to be ~/.influxdb3 (with 750 (rwxr-x---) permissions (700 also acceptable)) ... ... ... systemd unit that runs influxdb3 as influxdb3:influxdb3 and all the influxdb3-owned directories are 750 (rwxr-x---) and files 640 (rw-r-----) ...

Directory and files permissions conversations are of course nuanced. Eg, /etc/influxdb3 could be 755 and /etc/influxdb3/influxdb3.conf could be 644 if not storing anything sensitive in the /etc/influxdb3 directory or in the /etc/influxdb3/influxdb3.conf config file. That said, I still maintain 750 dir and 640 files permissions for configuration is a reasonable choice since we might want to add sensitive info into the config dir at some point (eg, credentials for replicating data somewhere) and it much better to start secure and letting people open that up themselves than trying to close it down after the fact.

I definitely recommend not allowing 'other' on the data directories though (eg, use 750 directory and 640 file permissions) since this is a good default desired by most users (and, again, closing it down after the fact is a pain).

pauldix commented 7 months ago

You're right that the expectation is that the influxdb3 create token would talk to the server, which would then put the token information into S3. There is a question of how to boostrap the server (i.e. start it up so that you can create the initial tokens). I imagine we could do this a few different ways.

Option 1: start up the server without any authorization, then the first time they call to create token, the server switches over into authorized only mode and writes the token information to the configured object store (which could be S3 or it could be local disk). After that point when it starts up, it loads the token & catalog information from object storage and would continue to require authorization.

Option 2: start up the server with a config flag of a bootstrap token and a flag that says it should always require authorization. Then the user can use the bootstrap token to create additional tokens.

Option 3: something else?

jdstrand commented 7 months ago

Option 1: start up the server without any authorization, then the first time they call to create token, the server switches over into authorized only mode and writes the token information to the configured object store (which could be S3 or it could be local disk). After that point when it starts up, it loads the token & catalog information from object storage and would continue to require authorization.

On first reading, I didn't care for this since anyone can write to it and someone first deploying it must secure it. Thinking about it more, from a security perspective, this is similar to a setup functionality since an initialized server typically uses default credentials (eg, admin/admin in countless web applications) or you need to run a setup command to bootstrap it (eg, influx setup). In either case, the server is listening for someone to change stuff. The fact that influxdb3 could be listening for writes/queries is no different security-wise than listening with default credentials/the first person to run setup to have someone then set it up to whatever they want.

That said, I hate default passwords and having a server listening for the first person to win the setup race is not ideal either. I wonder if there is any merit in having the server listen on loopback while in this pre-configured state.... That's probably a no go for many scenarios.

Further, I was also thinking that this sidesteps the issue of needing a special 'I can create tokens' permission and token, but I don't think it does because I don't think we'd want to conflate database permissions with management. (Though, admittedly, the title of this issue is "All-or-Nothing Authentication", but is that what it meant?)

Option 2: start up the server with a config flag of a bootstrap token and a flag that says it should always require authorization. Then the user can use the bootstrap token to create additional tokens.

Is the idea that the server can be started the first time with influxdb3 server --bootstrap s3cr3t then the influxdb3 create token invocation is modified to somehow specify the s3cr3t value? Then once you created your tokens, you restart the server without --bootstrap? Does the server refuse to start if it has no tokens (and --bootstrap is not supplied)?

That's quite secure since the server can never create tokens without being started with that, but it also means the server needs to be restarted to create new tokens, rotate, etc.

Option 3: variation of '2' - start the server with a config option to use a SHA512 of an "All-or-Nothing management token" and a flag that says it should always require authorization. Then the user can use the "All-or-Nothing management token" to create additional tokens.

From a security perspective, I like this since the user needs to take an explicit action and only the "All-or-Nothing management token" can be used to create tokens (I recommend keeping database tokens separate from management and avoid "god tokens").

Perhaps the bootstrap experience is: influxdb3 bootstrap or influxdb3 setup is run on the same machine as influxdb3 serve will be and it follows the same steps to generate a random base64 token and sha512 as with database tokens, but it stores the management token's sha512 as a config option in the config file (with secure permissions and ownership of course). Since this is modifying the config file directly, it can be run before the server starts (and can also be a 'wizard' to setup other tunables, if desired). This also supports ansible/etc/etc scenarios since someone need only supply the config file with the management sha512 in it. Losing the management token is not catastrophic as an admin of the machine can simply generate a new sha512 on the server (and it supports rotation).

influxdb3 serve could refuse to start if there is no configured management token sha512. If there is a desire to run without authz, then have an explicit option for that influxdb3 serve --skip-authorization.

By keeping the management token separate from database tokens, we also are not out of line with any future permissions enhancements that might come in the future (eg, where people are using different tokens for management vs reads vs writes vs ...).

hiltontj commented 7 months ago

I suggest influxdb3 <resource> <action> (in this case, influxdb3 token create)

Agreed, this works better if there are resource specific actions that go beyond basic CRUD, e.g., influxdb3 token disable.

Perhaps the bootstrap experience is: influxdb3 bootstrap or influxdb3 setup is run on the same machine as influxdb3 serve will be [...]

I like the idea of an influxdb3 setup command, that the user can use to bootstrap/build their configuration. Having a streamlined process for getting up and running in a terminal would be important for the initial dev experience, i.e., being able to do something like:

influxdb3 setup
# prompts:
# Use default configuration? [y/n]:
# ... if 'n' can configure object store, host port, etc.,
# Use authorization? [y/n]:
# ...
influxdb3 serve

without having to look up a bunch of CLI flags that need to be passed in would be great. That said, the CLI flags are still an option, and I guess would override what is in the config?

jdstrand commented 7 months ago

without having to look up a bunch of CLI flags that need to be passed in would be great. That said, the CLI flags are still an option, and I guess would override what is in the config?

Yes. IIRC, influxdb 2.x used the golang viper library to parse all of config, env and args and give the args to the application. I suspect there is a rust library that would do the same. IME the order would be command line overrides all and environment overrides config (but I didn't check what viper does).

pauldix commented 7 months ago

We already use dotenv and Clap so it loads config flags from command line, environment, file (a .env file) in that order.

hiltontj commented 7 months ago

config-rs is good as well if we want support for file formats like TOML, YAML, etc. - but I don't know how well that plays with clap, and may not be necessary if the config is not something someone is opening and reading/editing manually...

jdstrand commented 7 months ago

Note: ~/.influxdb3 is a shorthand for 'whatever the best practice is for the OS'. IMHO, we should follow the XDG specification on Linux, but literal ~/.influxdb3 is not terrible for any of them (eg, I think Mozilla still does this). I don't think any of our products get this right so when deciding, please look at modern standards for Mac, Windows and Linux (though, I can tell you, XDG directories are the way to go on Linux).

Rather serendipitously, the docs team commented on a related issue here: https://github.com/InfluxCommunity/influxdb3-python-cli/issues/16#issuecomment-2000484312