microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
162.23k stars 28.56k forks source link

Stabilize TextSearchProvider API #59921

Open roblourens opened 5 years ago

roblourens commented 5 years ago

Master issue to track stabilizing the TextSearchProvider extension API...

Forked from #47058

Depends on

andreamah commented 2 months ago

Hmm, that's a good argument to keep some type of base Uri around if the original call had it. If you currently call findTextInFiles with your base URI with an alternative authority and query in its include, does it appear correctly at the provider? I can always test this later too.

isc-bsaviano commented 2 months ago

@andreamah I've never tried that API but I can confirm that when using the Search view in a multi-root workspace folder and adding a "files to include" glob that only matches a single folder, the TextSearchProvider is only called for that folder, and that the glob pattern is normalized to have the workspace folder root as the base. This applies to the duplicate entry with the /** suffix as well.

andreamah commented 2 months ago

This applies to the duplicate entry with the /** suffix as well.

I'm not too sure what you mean by this statement?

isc-bsaviano commented 2 months ago

Sorry that I wasn't clear. When you have a glob pattern present in the UI, the options.includes array includes two elements for each pattern, one with an one without a globstar suffix:

Screenshot 2024-06-24 at 12 02 19 PM
andreamah commented 2 months ago

After a bit more of API review, a question came up: how important is message?: TextSearchCompleteMessage on TextSearchComplete? To some team members, it seemed like it was more of a UI add-on that we usually don't see being used in built-in text search. We use it in vscode.dev, but we can have another conversation about whether we can just change the behavior to not use the completion message.

@ the people who are using this API, do you use the message field on TextSearchComplete, and what is your use case?

If not many people are using it, we might consider keeping this part in particular as proposed to simplify the overall finalization process.

isc-bsaviano commented 2 months ago

@andreamah I use it to notify the user that there were errors for a certain number of documents and to check our Output channel for more info. I could easily replace this with a call to vscode.window.showErrorMessage(), or by just showing the Output channel.

andreamah commented 1 month ago

An update. These are the latest changes to the API via diagram:

Image

Image

Some major-ish changes compared to the last time i shared:

As usual, the changes are here. I'm making all the changes at once due to the intertwined nature of the changes between the search APIs.

Also, please note that these changes might take some time to clean up. I am finalizing the shape of the API, but I also need to make appropriate internal changes and do testing. So this might take a while to actually finalize.

isc-bsaviano commented 1 month ago

Thanks for the update @andreamah! I have a few questions:

  1. When TextSearchProvider called in a multi-root workspace and the "files to include" string only matches a single workspace folder, will the folderOptions array contain only one element (similar to the current behavior where the provider is only called for the one folder)?
  2. Is it guaranteed that GlobPatterns in the folderOptions.excludes array will have a baseURI that is in workspace folder folderOptions.folder?
andreamah commented 1 month ago

@isc-bsaviano thanks for reviewing my blurb!

  1. When TextSearchProvider called in a multi-root workspace and the "files to include" string only matches a single workspace folder, will the folderOptions array contain only one element (similar to the current behavior where the provider is only called for the one folder)?

Yup!

  1. Is it guaranteed that GlobPatterns in the folderOptions.excludes array will have a baseURI that is in workspace folder folderOptions.folder?

No, this is not guaranteed. You will need to see whether, if the exclude has a baseURI, whether it actually changes the search (or whether it's invalid). When you get the search call from the UI, it should be a valid baseURI. However, when you get arguments from a findFiles or findTextInFiles API call, we can't guarantee that the includes/excludes make sense.

isc-bsaviano commented 1 month ago

Thanks @andreamah, that's all good to know! Is there a reason why VS Code will no longer be "pre-processing" the exclude globs (filtering out ones that don't apply and turning the rest into strings relative to the workspace folder root)? It's still being done for includes. I will keep my eye out for when this is available so I can update my implementations.

andreamah commented 1 month ago

@isc-bsaviano do you mean having all of the strings relative to one workspace root? The reason was because findFiles/findTextInFiles can possibly return multiple includes and excludes, with different baseURI. If we were to normalize all of these to be relative to the same URI, we'd need to modify the glob patterns, which becomes hard in complicated globs (if the baseURI is very different for the different includes/excludes, should we have them all be patterns relative to the root of the filesystem in the worst case?).

In the case of doing a search from the UI, we will try to keep the options as clean as possible, which likely will involve filtering out and modifying exclude patterns to be friendly with the current folder (since the includes should only be relative to one URI if there is any). However, once we get arguments from an API call (findFiles/findTextInFiles), we can't guarantee anything.

With findTextInFiles, we currently erroneously ignored the exclude baseURI, which is wrong. However, to make this info flow correctly through to the provider, we needed to change things up.

isc-bsaviano commented 1 month ago

Thanks @andreamah, that makes sense

isc-bsaviano commented 1 month ago

Hi @andreamah, one more quick question about the new APIs. Does maxResults apply to each folder, or to the entire workspace?

andreamah commented 1 month ago

Hi @andreamah, one more quick question about the new APIs. Does maxResults apply to each folder, or to the entire workspace?

Good question! maxResults should apply to the entire workspace.

isc-bsaviano commented 1 month ago

@andreamah In version 1.92 I see new proposed APIs textSearchProviderNew and fileSearchProviderNew. Are these ready for extension authors to adopt so we can evaluate the new API surface?

caleb-allen commented 1 month ago

Hey @andreamah, thanks for your great work on this.

I've been developing a "semantics" search system called TSearch, and I'm curious if it would qualify as a TextSearchProvider, or if it perhaps falls outside the scope of this API. Would love to hear your thoughts.

Rather than taking the "code words" and generating an index of text, it constructs an index where each code word is encoded with its semantic "context".

Take, for example, this javascript snippet:

function hello(name) {
   console.log("Hello", name);
}

In addition to including the word hello, the index also encodes the semantic use of hello—in this case, it is a function name. The point of it all is to let somebody search not just for textual instances of hello, but specifically for functions called hello. Or for variables, string literals, etc., and to do so across very large projects.

Anyways, the reason I'm describing all this is because the "query language" used to search this index isn't exactly text, it's got a few simple rules and operators in order to distinguish which parts of a query are for context ("function") and which are for content ("hello"). It's not as gnarly as regex, not even close, but it's definitely not "just" text.

I see that TextSearchQuery has the property isRegExp, explicitly including at least one instance of search which isn't strictly text. My question is this: is this API seen as something which supports a more general function of "pattern matching", where text and regex are simply two implementations? Or if not, is such a design something that might be desired? As a (rough, uninformed) idea, perhaps indicating a patternEngine property in TextSearchQuery would suffice to open up the API to a much larger feature set. patternEngine would default to ["text", "regex"], and support the same behavior and API that exists today, e.g. isRegExp = (patternEngine == "regex"), but would explicitly tease out regex as simply a default implementation of a more general concept—that of a pattern engine.

Regardless, I don't think my question needs an answer before the TextSearchProvider API can be stabilized (I don't think it'd be a breaking change anyways). I'm primarily interested in the conversation about whether this lower-level part of search is of interest as a surface for extension.

Let me know if any of this needs clarification. Thanks!

andreamah commented 1 month ago

@andreamah In version 1.92 I see new proposed APIs textSearchProviderNew and fileSearchProviderNew. Are these ready for extension authors to adopt so we can evaluate the new API surface?

This isn't ready to be consumed- I'm just creating these so that I can start changing the internal implementation without affecting the existing proposed APIs (since some internal extensions currently use it).

isc-bsaviano commented 1 month ago

Thanks for the clarification and for all of your hard work to help make this a reality!

andreamah commented 1 month ago

@caleb-allen Great question!

For the most part, this API is meant to be consumed to simply search for 'text' as-is. For example, if you create a custom filesystem, this API helps with actually understanding what it takes to get search results from your project. This being said, it was not necessarily created to facilitate a special or 'intelligent' search that requires custom options. The reason why the options have things like isRegex is because the UI (aka the search view on the sidebar) will have a button for that, which will drive what info we send to the API. If we introduced more/alternative options, this would preferably match changes in the UI. We want to keep the options simple, as that is what the user expects out of the search view (for now). Also, we only allow one provider per file scheme, so regular text would lose the traditional ripgrep text results if you overwrote our default provider for text with your own.

caleb-allen commented 1 month ago

If we introduced more/alternative options, this would preferably match changes in the UI. We want to keep the options simple, as that is what the user expects out of the search view (for now). Also, we only allow one provider per file scheme, so regular text would lose the traditional ripgrep text results if you overwrote our default provider for text with your own.

I see, this clarifies things a lot, thank you!

It seems that the behavior I'm trying to construct may be better achieved with other APIs, an "enhancement" on search, rather than modifying the deeper plumbing of all search.

Thanks for your answer!