shlinkio / shlink

The definitive self-hosted URL shortener
https://shlink.io
MIT License
3.27k stars 259 forks source link

Store API key hashes, instead of plain-text values #2193

Open r33drichards opened 1 month ago

r33drichards commented 1 month ago

Shlink version

4.2.0

PHP version

8.3

How do you serve Shlink

Self-hosted Apache

Database engine

PostgreSQL

Database version

16.3

Current behavior

When I look at my api_keys table for shlink, i can see the api key present, and it is not hashed

Screenshot 2024-09-11 at 12 34 57 PM

Expected behavior

Since the exact value of this api key is not needed, only needing to be checked against a supplied one, it should be hashed like a password, to prevent accidental leaks https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html#background

Minimum steps to reproduce

acelaya commented 1 month ago

Shlink does need to be able to display API keys, for example in the api-key:list console command, and I think it was needed in some other place as well.

Also, I think the raw key itself is used to identify API keys in a few places, so hashing them would affect this kind of mechanism as well. I would need to verify this though.

So I agree with you they should be hashed, and I would probably implement it differently today, but it's not easy to change now for backwards compatibility reasons.

I will plan on changing this, but it's going to take a while and needs to be tested carefully.

Among other things it will need the name to be mandatory, so that you have later a way to identify API keys in case you need to revoke, restrict, etc. In existing installations most API keys don't have a name ATM.

These are I few things I can think of, that would be needed in order to allow this change:

  1. Update API key creation command, making the name mandatory. It should warn about duplicated names, even if that's not currently enforced in the database. Another option would be to generate a name on the fly if one is not provided, using current date or something in those lines. This command would print the plain-text API key once, and then it's up to you to save the value suecurely.
  2. Add a command to allow editing existing API key names.
  3. Migrate existing keys, setting their raw value in the title field for those where the title is null, then update the key to hash it. Update the logic that checks API keys so that it verifies the hash instead of comparing plain-text values. This does not immediately remove the issue, as you still have the plain-text value in one column, but the system will no longer rely on it, allowing you to edit that using the command from step 2.
  4. Next major version would set the name column in the database as non-nullable and unique.

In addition to this, I need to check what's the performance impact on verifying the API key hash on every request. I guess it would be negligible, but something worth checking.

But overall, this needs to be done carefully, as once keys have been hashed there's no way back, so breaking users installations would be bad. They would need to generate new API keys.

acelaya commented 1 month ago

On the other hand, one reason to hash user passwords is because they are provided by them, and that ensures nobody with access to the database info (legitimately or illegitimately), can know what the user originally provided, which they frequently reuse in other services.

API keys are random and generated by Shlink. If someone can access your database, you are basically screwed already, as they can do anything they would through the API. They don't really need the API key anymore, and the value itself does not present extra risks as it does with user-provided passwords.

r33drichards commented 1 month ago

Thanks for taking the time to reply!

Shlink does need to be able to display API keys, for example in the api-key:list console command, and I think it was needed in some other place as well.

Also, I think the raw key itself is used to identify API keys in a few places, so hashing them would affect this kind of mechanism as well. I would need to verify this though.

So I agree with you they should be hashed, and I would probably implement it differently today, but it's not easy to change now for backwards compatibility reasons.

ahh yea that makes sense, seems like encrypting the token and storing the encrypted value and decrypting it could be a viable option then.

Not sure if you need a name to revoke and delete a token, surely just selecting the token by its database id is enough? All of those things are not strictly necessary to implement column level encryption for the value of the tokens, but I agree especially on the ability to revoke tokens being an important and useful thing to add.

API keys are random and generated by Shlink. If someone can access your database, you are basically screwed already, as they can do anything they would through the API. They don't really need the API key anymore, and the value itself does not present extra risks as it does with user-provided passwords.

You are absolutely correct in this situation! If I was in a scenario where I had a database leak and I know about it, I'd have to rotate every secret key b/c they would all be compromised if I wanted to continue using the service as is as well. But if you consider the situation where my database is compromised and I DON'T know that its compromised, I have no way to prevent someone from using an api key that they would have obtained from the stolen database to continue to leak data. If someone obtained the db dump through a one time exploit or a purchase on the black market, they could continue to export data. If the values are encrypted or hashed, I could at least have forward secrecy with my data in the event of a leak.

Encrypting tokens can also help prevent against internal threats such as curious administrators or contractors who might have database access but shouldn't see raw API keys.

So yea in summary I agree that hashing is probably not an option for shlink, but having something that sits in front of the data layer that encrypts and decrypts that value so that its encrypted at rest would be a security enhancement.

acelaya commented 1 month ago

Oh, I didn't mean to disregard hashing as an option, in fact I think it would be preferable if I had to choose.

I just wanted to a) provide more context, and b) explain the difference between user-provided and system-generated credentials.

acelaya commented 7 hours ago

One tricky thing I found is that the API key is currently checked via SELECT ... WHERE key={provided_key} LIMIT 1. If the keys are hashed in the database, there's no longer a way to do a single SELECT via a non-hashed key.

I've been investigating, and since API keys are system-generated and random, in most places it is considered to be secure enough to use a non-salted hash. Since that produces always the same result for a specific key, it would allow to continue doing a single SELECT, but with the hash instead of the plain key.

A SHA256 is secure enough and quite fast so that it should not add any overhead to API requests.