Create a vault /secret/search endpoint - search the list namespace

TopherGopher commented 8 years ago

This endpoint would accept a GET search string and would return a JSON payload of all paths the user is authorized for where 'search string' in path name. There are 2 ways to search:

Search all secret names/paths for the keyword
Search the secrets themselves for the keyword For us, we only need the search to really touch the name. There's almost never an instance, for example, where you have a password and you need to search for the secret it belongs to.

We wrote a client-side application that lists all secret directories and secrets, but the amount of recursive requests that you have to do on that end is the bottleneck for our search. Additionally, this loads all authorized secret paths into local active memory, which is...not necessarily ideal...but OK. Moving the work to the server would seem to be the best optimization.

The other feature we could settle on would be a "recursive" flag on the /list endpoint, which would return ALL paths that the user is authorized to see. This would still load the paths all into active memory, but then we would only need to do a single call and could search the entire namespace.

jefferai commented 8 years ago

This is likely not something that we can support, although I don't want to close this without thinking about it a bit more. Having a way to list all paths for which the user has any policy defined might be doable.

ekristen commented 8 years ago

What about a "tree" view on certain backends (like generic)? This would eliminate the need to make multiple list calls on the various paths.

jefferai commented 8 years ago

@ekristen The issue isn't listing the paths; the issue is then examining every path to see whether the given user is "authorized" for that path. As one base concern, this could entail two million paths; analyzing each and every one will cause huge burden on the server, and I'm honestly much more concerned with server responsiveness than inconveniencing clients.

Another thing that this request doesn't really take into account is that it's not just about "being authorized" because all capabilities are standalone. The OP may only care about read but if we add this feature I guarantee we'll have people that want to know about paths for which they have create and list or some other combination. So the capabilities-self endpoint can figure this out for any given path but LIST doesn't really allow for this kind of data to be returned.

TopherGopher commented 8 years ago

What about with a max-depth limit? Set it by default to something low and allow override up to, say...10? I don't know what those magic numbers would be, but if I could set the root folder and max depth, I could limit my search domain significantly, reducing backend calls

TopherGopher commented 8 years ago

And when I call the /list endpoint, it just returns paths for which I'm authorized to read, I just assumed that was default behavior, so good point on the desire to see paths for which they have other perms.

jefferai commented 8 years ago

And when I call the /list endpoint, it just returns paths for which I'm authorized to read

This is not the case. No authorization checking is performed.

Mykolaichenko commented 8 years ago

I've released small workaround that can be very helpful. Here is it: https://github.com/Mykolaichenko/gotools/tree/master/vaulter

It extends vault, and provide methods like 'tree' and 'search' for usability.

marcusgrando commented 6 years ago

VOTE 👍

cha7ri commented 6 years ago

Any update on this?

jefferai commented 6 years ago

No current plans.

Ernest0x commented 6 years ago

No current plans.

Hello. We are evaluating vault as a secrets management solution and we think search functionality is a must, particularly when using the web UI. I really understand the technical difficulties to implement this, but I am surprised that there are no plans for this. Isn't it something frequently requested?

Ernest0x commented 6 years ago

Thinking about this a little more, I believe that it makes sense to start with a solution that allows searching just for paths and perhaps, in a later iteration, expand the search functionality in more data (search for keys or even key values). As it is generally wise to name the secret paths in a way that they give a good hint about their contents or purpose, in most cases, searching by the path, would limit the results enough for the users to find what they want.

So, the main, very understandable, concern of both @jefferai and other engineers remains: What would be the performance hit when searching in a vault setup with more (or less) than 2 million paths? First, I think that we should all agree that this is exactly a scenario that search functionality is very important. It is almost impossible to find what you want in that great amount of data without the ability to search for that. On the other hand, searching should not compromise the responsiveness of the server. And it seems that this would happen if checking for the correct permissions for each one of all these paths.

So, while I am not familiar with the internals of Vault, I wonder if that (permission checking for all paths) is really necessary. To me, it makes sense that searching should happen first and permission checking be applied later, only on the results of the search. That way, the permission checking requests could be reduced to a number that would not introduce an unacceptable performance hit. In that direction, the search keyword could also be restricted with a limit on the minimum number of characters. Searching with a single-character keyword does not make sense anyway and in many cases a search keyword with less than 4 or 5 characters would not be very helpful either. So, for example, if searching with a 5-character keyword in two million paths would reduce the results to 50 or 100 paths, I believe the performance hit for permission checking for just these paths would be neglectable.

Even if the results are not reduced enough, I can think of other techniques than can be introduced to help with performance.

One thing could be to have a "pagination" feature on the results, so that users can proceed to the "next" batch of the results if they do not find what they want in the current batch and/or previous batches. Permission checking in that case would happen only for each batch of the results, and the users could control if they need to move to the next batch or not. "Best" matches would be returned first, so most of the time one or two batches would be enough to get what you want.

Another optimization possibility could be the caching of the final results after permission checking. That cache would involve an association of a specific user/entity to a final result set for a specific search keyword. So, repeated search for the same keyword in a short (maybe configurable?) period of time would not send the same requests again and again. In that case, searching with keywords that are more specific than keywords that are already in the cache (e.g. just adding some more characters to a keyword that had before produced some results), would also benefit from the results that are already in the cache.

I am sure that there are many more techniques, that people with experience in database searching technology can think of. I just expressed some first, simple thoughts of mine, in the hope that we see more action on this really useful feature.

Ernest0x commented 6 years ago

I have looked at alternative secret management systems Conjur and Barbican. They both have similar goals to Vault and they both have a straight way to search / filter on the secrets or resources via their API:

Furthermore, it seems that my suggestion of a "pagination" feature has already been implemented in these systems. So, I guess that this is something that makes sense and can be used both as an optimization and for users' convenience.

So, are we going to see this kind of functionality in Vault? Has it been discussed internally in Hashicorp? What are the thoughts of others needing that feature too? Don't you think that this feature request needs to be given more attention?

Ernest0x commented 6 years ago

Hello again. Since it seems that Hashicorp has still no plans for this feature request, I would like to figure out whether a contribution by a third party would be possible. Do you people at Hashicorp think that a good developer with high experience in Go could start working on this? Or is it something that only Vault developers with deep knowledge of Vault's internals can touch? Also, would you accept such a contribution if the implementation is of good quality? And would you help with your advice and review on such work?

jefferai commented 6 years ago

@Ernest0x If a design could be agreed on aead of time. There are multiple problems however:

What "search" is isn't well defined because Vault is very agnostic and has little insight into what a given plugin might be doing. What does "search" mean for transit? For an LDAP auth backend? For control groups?
Concerns around performance if Vault is perfoming expensive filtering/searching operations on-server. It's a tradeoff between putting more data across the wire vs. potentially loading the server to do queries. Availability is super important.
More things I can't remember right now.

The Vault team doesn't have search planned because we don't have answers to those questions. Without those questions being answered, a third-party contribution can't be accepted either.

Ernest0x commented 6 years ago

@jefferai Thank you for explaining why there are no plans on this yet.

So, as I understand it, answering the questions above is something that you would prefer to discuss internally, but not giving it a higher priority right now?

jefferai commented 6 years ago

It's not that we can't discuss with users, we just don't have any good ideas right now :-)

As for priority, compared to a great many other features we have actually seen very little demand for this, so the LoE required vs. demand ratio has been rather lopsided.

Ernest0x commented 6 years ago

If the demand for this is that low as you said, I understand that this is going to take longer time than I expected to be given more thought. Nevertheless, I would like to express my first thoughts on the questions you mentioned and I hope that other people will come and add their own thoughts too, sooner or later.

What "search" is isn't well defined because Vault is very agnostic and has little insight into what a given plugin might be doing. What does "search" mean for transit? For an LDAP auth backend? For control groups?

To me, it makes sense that each plugin defines and implements its own search functionality. An approach would be to add recursive mode (if needed/makes sense) to the LIST operations that a plugin may provide and also add a filtering mechanism. The filtering language could be defined per plugin too.

So, for example, in the case of the KV plugin, the existing API endpoint for listing the keys in a path could be extended with the following parameters:

recursive (bool: false) If "true" it would also return subpaths and their keys recursively
filter (string: "") A string for filtering results The filter string format / language could be very simple or more sophisticated. For example, a filter string as simple as path:"<keyword>" would list all the keys for the paths that their name matches <keyword>. A filter string like key:"<keyword>" would list all the keys for the paths that include any key that matches <keyword>. A filter string like key:"<keyword>",exclusive:true would do the same but only list the matched keys. A filter string could also take operators: key:"<keyword1>",exclusive:true AND path:"<keyword2>"

In that way, plugins would get search functionality incrementally as demanded by use cases. E.g. start with the KV plugin which looks like the simplest to do, then continue with other plugins. Also, search for some plugins may not make sense.

Concerns around performance if Vault is perfoming expensive filtering/searching operations on-server. It's a tradeoff between putting more data across the wire vs. potentially loading the server to do queries. Availability is super important.

For the performance concerns, I had expressed some thoughts in previous messages.

More things I can't remember right now.

I would be glad to offer my humble opinion about any of these.

dupainaulevain commented 5 years ago

I would be most interested in any client side scripts providing a search feature. It's for a relatively small database and there are no performance concerns.

jsirianni commented 5 years ago

I am rolling out vault right now, and some immediate feedback I was given from my users is that a vault search option would be really nice.

stravassac commented 5 years ago

Vote for vault search option

Ernest0x commented 5 years ago

@jefferai, since it has been almost a year from the last time you expressed your thoughts and concerns on this, I would be glad to here from an official source whether this topic has been discussed internally in Hashicorp any further. Can we hope that there is still intention to make progress on this?

jefferai commented 5 years ago

No change from the previous comment, sorry.

TopherGopher commented 5 years ago

It may sound odd, but your LDAP comment made me start thinking about how LDAP implements search and I wonder if we can borrow a few tricks. Server side, it's a recursive search algorithm that allows you to filter results by queries against fields and limits search results based on authentication level. Every search request has a max execution time. Server takes precedence over client defined timeout. You can specify a sub-tree to limit the search to, which helps provide an efficiency boost.

I understand vault is used in a lot of different ways, but the idea of having just the ability to filter on name with a recursive flag would have so many benefits in so many areas. The idea of having a plug-in to extend that further for my own project sounds amazing. Having to return all the data from all the paths to the client in order to search across it has obvious inefficiencies that costs us a lot client-side.

Luckily though, this is Open Source, so I guess if Hashicorp refuses to help us write a secure performant way of searching for secrets, we could fork. @Ernest0x @stravassac @cha7ri @jsirianni Would you guys have an development time that you would be willing to put forth so that we could implement an elegant, community supported solution?

Ernest0x commented 5 years ago

Luckily though, this is Open Source, so I guess if Hashicorp refuses to help us write a secure performant way of searching for secrets, we could fork. @Ernest0x @stravassac @cha7ri @jsirianni Would you guys have an development time that you would be willing to put forth so that we could implement an elegant, community supported solution?

We would help in any way we can for an official support, but forking is not an option for us.

Currently, we are adopting Vault in two phases. In the first phase, we integrate it as a secret management tool used for automated software and infrastructure deployment. In that phase, Vault works as it is.

In the second phase, we are looking into a solution for secrets used by end-users (humans), e.g. in logins, web forms, etc. This is where we will need Vault to support a filtering language for searching into the secrets database. Maybe in HashiCorp they are not developing Vault with that use case in mind, but the feature set of the tool matches perfectly. The only thing missing is the search functionality. Maybe many features in Vault are coming from requirements of enterprise customers, and there is none in that class that has expressed interest in some kind of search endpoint.

So, if HashiCorp is not going towards that direction, perhaps we will look into other tools that already support search, such as Conjur by CyberArk.

jefferai commented 5 years ago

It's not that we're unwilling, we just have yet to figure out a way that works in all cases, or even a majority of cases.

There are a number of issues. Some of them:

Vault is basically a high level plugin architecture. For any given plugin, what does search mean? Is it against role names? Path names? Role values, key values, config values? How does one create and expose an interface that provides sufficient flexibility and options for allowing search on any potentially searchable resource on any plugin of arbitrary paths, data structures, and behavior?

How does one actually make this performant? Suppose we restrict search simply to values of keys in the KV store. How do we search without loading every single value from storage? (Keep in mind: the number one problem we have seen with enterprise deployments comes down to the storage system being overwhelmed.) If we index, how do we index while keeping within the tolerances (e.g. key/value size) of all potential storage backends? How do we do that without having a long wait on unseal while the index is loaded? How do we ensure that we won't be leaking sensitive information? (Keep in mind, some people consider paths to be sensitive, much less keys, much less values.) If we do exact matching, do we leak the presence of a secret even if not its value? If we do partial matching, could we leak the exact sensitive value in search results?

Suppose we only search K/V paths -- basically recursive listing and filtering. Why is it beneficial for the server to perform this function instead of a client pipelining requests and filtering as it goes?

In other words, the exact same concerns expressed in https://github.com/hashicorp/vault/issues/1973#issuecomment-416286026 have yet to be solved. There may not be good answers, and if there are, we haven't found them yet.

TopherGopher commented 5 years ago

It is unfair to say you're unwilling - I'll admit that. Would it be possible to setup a community meeting then so that we could discuss different use cases? The issue was created 3 years ago, and I have no idea how I'm supposed to get more traction on an issue.

jefferai commented 5 years ago

Unfortunately I don't think the use cases are the problem, I think the implementation is.

TopherGopher commented 5 years ago

It feels like you are overcomplicating the implementation though. Think LDAP, which also has a huge use-case, storing structured yet variable data. Searching records that only you have access to. The idea behind doing the search for content server side is to avoid that exact leaking issue you're talking about. I want a user to be able to search all of their own secrets for content ultimately, but minimally, it would be nice just be able to search for a secret by name. Rather than trying to say: I want a porche, but it's too hard to build a generic porche for everyone, could we instead start with the model T? JUST search by path. Only path. Surely that's generic enough with an easy enough implementation. We aren't the first people to have to do efficient searches, so surely there's some efficient algorithms, or a hashing mechanism we could leverage. It's all a matter of the number of API queries we have to do and how much data we have to load and expose client side. These are not insurmountable issues, but if you truly feel that they are, and hashicorp doesn't have time to slate a design meeting, then let's close this ticket out.

cpoole commented 5 years ago

@jefferai since this is likely outside the scope of core vault could this be implemented as a plugin?

It would basically be an audit plugin that we could run on a separate set of servers and could handle the search queries.

Ernest0x commented 4 years ago

@cpoole while brainstorming on implementation ideas is surely a good thing and I have personally offered some of my own ideas before (perhaps nothing that Vault engineers could not think of themselves), I do not think that HashiCorp will change its priorities based on requests coming from the open source community users. What we actually need is a good push from enterprise customers. In fact, I am aware of at least one such customer (a very big name) that has asked support for this. Unfortunately, it seems this is not enough. So, I hope more enterprise customers of HashiCorp start asking for this and see what happens in the end.

Aracki commented 4 years ago

There is vaku. With vaku folder search you can do the similar thing.

TopherGopher commented 4 years ago

Yup - that implements the functionality I've had to implement myself in our own project and the same logic that others on this thread have had to implement. If you check out the search code: https://github.com/lingrino/vaku/blob/master/vaku/folder_search.go#L94

Basically - loop over every secret - load it all into unencrypted memory client-side - and look for matches. The advantage behind having it server side is that we're not all writing this same basic function and we can avoid a bunch of network I/O and insecure leaking by having a generic feature server-side.

blodan commented 4 years ago

+1 from us, we really need a search function

sp3c73r2038 commented 4 years ago

+1 to search or "recursive" list feature. That would be very nice to have.

FCTN-RRaitz commented 4 years ago

We are performing a test implementation of Vault and search functionality has become a sticking point as well. We are having to switch gears and look at other solutions due to this limitation.

voyera commented 3 years ago

+1 also following this issue. I do sympathize with the complexity aspect but it would still be a very useful feature :)

vhristev commented 3 years ago

+2

Scorcerer commented 3 years ago

Also +1, having no real recursive search is really annoying, even older solutions like keepass have that. Anything for starters would be good, especially as basic as paths.

dlerch-transporeon commented 3 years ago

+1

mmarkgraf-tpgroup commented 3 years ago

As for priority, compared to a great many other features we have actually seen very little demand for this, so the LoE required vs. demand ratio has been rather lopsided.

That is because you do not get feedback from endusers. The demand you're seeing is filtered by 1st-Level and management.

Ask any Ops-Person running vault for a company. Have us unleash our endusers here if you need more demand ;-)

Or just multiply the users posting here by 500, to get a rough estimate of the real demand.

Marcus-James-Adams commented 3 years ago

Agreed most organizations want a password/secret management solution that can be used by both end-users and automation tooling terraform / DevOps etc. I really do think that Hashicorp is missing the trick and that they do not see the demand. Or rather the no demand is that users see that vault does not support it and walk on by.

jboero commented 3 years ago

This isn't a core feature but if you're using my FUSE clients for Vault you can search a namespace like a normal filesystem:

$ find /mnt/vault -iname '*pem*'
/mnt/vault/pki/ca/pem
/mnt/vault/ssh/ca/pem
$ find /mnt/vault -iname '*kv*'
/mnt/vault/kv
/mnt/vault/kv1
$ tree /mnt/vault/kv
/mnt/vault/kv
├── test1
└── test2
$ tree /mnt/vault/pki/
/mnt/vault/pki/
├── ca
│   └── pem
├── certs
│   ├── 34-1e-c0-71-7c-73-96-ef-a2-e2-3f-08-67-34-e7-98-e4-af-5c-fd
│   ├── ca
│   ├── ca_chain
│   └── crl
├── creds
└── roles

https://github.com/jboero/hashifuse/tree/master/VaultFS

avlgolovchenko commented 2 years ago

I join those in favor. Many tools have a search. There is no desire to look for something instead of Vault, therefore it is a really powerful tool, but some "childish" functions are missing. I'm not talking about more convenient tools for copying and moving secrets - please implement a search.

sdbrennan commented 2 years ago

Adding my voice of a need for this, especially for more complex or multi-tenant namespace trees.

mechaHarry commented 2 years ago

+1 from here as well, need this to be able to bring end users over to the vault-side of secret management.

elrickxxv commented 2 years ago

@TopherGopher, @Ernest0x, and others have already made a very compelling case for the search feature. It's been mentioned that there needs to be a push from enterprise customers for this to get any traction. I am an enterprise customer, currently posting with my personal account for policy reasons, and there is no question that the search feature is something desired by enterprise. It is astounding that for the amount of money we must pay to use Vault Enterprise, there is no search feature provided.

Much like @Ernest0x we purchased Vault for two primary use cases, automated systems where a search is not needed, and for humans. When evaluating Vault, we realized that the UI was a bit primitive compared to the APIs, but somehow we missed the fact that there was absolutely no reasonable way to perform searches (that's on us). I have repeatedly mentioned this as a desired feature to reps from Hashicorp, but nothing yet other than links to other client side solutions that are far from ideal.

To me it's mind boggling that search features are not part Vault. I recognize that searching may not make sense for all secret engines, but certainly for any KV type store they do. I have drafted extensive documentation for the team of three people in my organization that are currently responsible for feeding secrets into the platform for consumption by others. Regardless of that, secrets regularly get placed in the completely wrong location, creating a needle-in-the-haystack scenario where one of us has to click all over the place in hopes of finding the secret, or we have to rely on our ability to reach the person who put the secret in and that person's memory about where they placed it. I need to open up vault to a much larger number of people, but haven't been able to because I know this problem will just compound.

If the underlying architecture of Vault makes searching difficult, well... as an enterprise customer, that's not my problem. I've been designing and developing software for close to 20 years, and "it's hard/complicated" has never been a valid reason not to implement a basic feature for the end users. It's not like this request is for some far fetched use case; this is basic functionality. There are plenty of strategies to reduce the performance impact on the server, @Ernest0x has posted several good ones, and I can think of several more.

What I've been told by the HashiCorp team is that Vault wasn't really designed with Human users in mind, and that there are many other password management tools for humans, and I accept that, but there is still a fundamental problem here. Some secrets are created and consumed entirely by automation, some entirely by humans, and some by a combination of both. Take a use case where a human DBA creates the secret for consumption by an automated application, but the DevOps team is responsible for creating the policies and delivering that secret to the application using the vault agent. In this scenario there are two human actors, and automated actor. If I go with a different product for the DBAs (humans) to use, which has the basic features that a human needs, then my DevOps engineer has to duplicate the secret from the human secret store into Vault, and make it available to the application. Now I have to manage the same credential in two different secret management tools, including some strategy to keep them synchronized, which obviously makes no sense. I don't see any reasonable argument for having two secret management platforms in any organization, which means HashiCorp needs to consider human use cases just as important as automation use cases.

I've spent too long on this already, but I will continue to advocate for a search feature, and if one is not forthcoming, I will switch my enterprise to a different secret management platform, and pay someone else hundreds of thousands of dollars a year.

heatherezell commented 2 years ago

Hi there @elrickxxv - thank you very much for your response here! I want you to know that I appreciate it, because this is something you clearly care deeply about - and if you didn't, you wouldn't be here. As a community manager, it's my job to help bubble this up to our product and engineering leadership, and I'll be sharing this feedback directly with them. I can't make any guarantees as to whether or when this request might be implemented, but I can tell you that I'm personally trying to keep an eye on these sorts of feedback, where a customer need is clearly presented. Thank you, again, for letting us know your needs and your use cases.

chietti commented 2 years ago

We are also considering to use the KV Engine to store secrets used by humans. We would like to apply tags to secrets to group them into categories which can be achieved using custom metadata but the lack of an efficient search option on metadata makes it almost unusable. Iterating over thousands of secrets to search for a specific tag remotely is not what we want to do.

artem1982 commented 2 years ago

Implement it just for KV engine, start with something =) Do not overcomplicate it. Use case - UI should have search. If you have 200 secrets, but do not remember where it is - client spent too much time "browsing" in UI. Somebody wants to use Vault not only for DevOps task, but human task as well. OIDC (Azure AD) + KV is the usecase! Per today customers have to choose LastPass/Bitwarden in addition to Vault and start asking about dynamic secrets there =)

hashicorp / vault

Create a vault /secret/search endpoint - search the list namespace #1973