Stabilize TextSearchProvider API

roblourens commented 5 years ago

Master issue to track stabilizing the TextSearchProvider extension API...

Forked from #47058

Depends on

59919
59458

suntobright commented 5 years ago

looking forward to progress

mostafaeweda commented 5 years ago

Any progress on moving this to the stable API?

gjsjohnmurray commented 5 years ago

Is there anything preventing this from happening? Both of the issues it's shown as depending on are already closed.

roblourens commented 5 years ago

This still isn't ready to be stabilized, I don't have any ETA, sorry.

eamodio commented 5 years ago

There needs to be a 😢 reaction

gjsjohnmurray commented 4 years ago

Any progress to report on getting this finalized?

gjsjohnmurray commented 4 years ago

Please put this on the 2020 roadmap, preferably sooner rather than later.

gjsjohnmurray commented 4 years ago

@roblourens 🙏 can this get some love soon? Those of us building extensions that implement FileSystemProvider can only offer search if we get our users to (a) use Insiders, (b) download and install a VSIX, and (c) launch Insiders with the correct --enable-proposed-api argument.

It is a small relief that (c) can now be achieved using argv.json, but what we really need is the API finalized.

NightRa commented 4 years ago

I'm interested in the stabilization of this as well. What's blocking this, and how can we help?

roblourens commented 4 years ago

Sorry, it still needs a lot of thinking and I won't get to it in the near future

gjsjohnmurray commented 4 years ago

Sorry, it still needs a lot of thinking and I won't get to it in the near future

@roblourens this is disheartening news for me, and likely for others trying to leverage FileSystemProvider and take VSCode into new domains. Is there any way we can help? What do you see as the outstanding problems with the proposed API?

CompuIves commented 3 years ago

Heyo! I've been following this issue for a while, and I was wondering if by now there's a better idea on when this API will be stabilized. Seeing that there are already quite some extensions using it, even including Microsoft extensions in the marketplace.

gjsjohnmurray commented 3 years ago

@roblourens please can we blow the dust off this and get it into Stable? Or else get an understanding of what's holding it up? IMO as more and more FileSystemProvider implementations show up (aka virtual filesystems) the more important it becomes to resolve this.

worksofliam commented 2 years ago

What's the status here? I am looking to utilise such API to implement a search over my own FileSystemProvider.

Thanks!

gjsjohnmurray commented 2 years ago

@roblourens is this still on your radar? If you don't have capacity for it can it be handed over to another member of the team?

Extension authors who need this but whose extensions aren't "blessed" as being allowed to use this proposed API have to try and compete while having one hand tied behind their back.

gjsjohnmurray commented 1 year ago

@roblourens are there some known problems or shortcomings affecting this proposed API, four full years after you created this issue? The only one I am aware of is it still hasn't been finalized.

roblourens commented 1 year ago

This will be one for @andreamah to look at some day

pattazl commented 1 year ago

waiting for TextSearchProvider to the release , not insider. I want to get the matched words by TextSearchResult, not only the whole text by TextDocument.getText()

isc-bsaviano commented 1 year ago

Hi @andreamah, now that 1.74.0 is out and you're in the yearly housekeeping iteration do you have an updates on when this issue and #73524 (FileSearchProvider finalization) may be addressed? They have been stagnant for years and we haven't gotten any feedback on why finalization hasn't happened. For my team in particular, finalizing these two APIs would make a huge difference for our end users. If there's anything I can do to help get these two issues finalized please let me know.

cfmaer commented 1 year ago

I can only agree with @isc-bsaviano. This feature would be very very helpful for us as well.

roblourens commented 1 year ago

Sorry, we're aware that this is an important API for a lot of scenarios. We don't have a specific plan for it at the moment but it is still on our radar. The only reason it didn't happen originally, and hasn't since then, is that it turned out to be a complex API that was hard to design in a way that would be performant and also consistent with the rest of our extension API, and other feature areas have taken priority since then.

isc-bsaviano commented 1 year ago

@roblourens Thanks for the update. Does that comment apply to FileSearchProvider as well? While this is much more important for us, we consume that one as well and therefore have a desire to see it finalized.

roblourens commented 1 year ago

Yes, same status

JasonHassold commented 1 year ago

@roblourens @andreamah Hi, I am currently working on an extension using this proposed API and am a bit concerned that it will not be finalized in the near future. It seems fully functional for my use case in its current state, and I really don't want to have to go the route of implementing a secondary search system outside of VSCode to accomplish the goal of my extension. I think some of the confusion myself and others have around this is that it seems to work fine, and we are missing out on the opportunity to utilize this feature in a meaningful way at the moment.

I think this is a really cool API, and a bunch of useful extensions could be built on top of it. Is there any way to get the priority of this moved up? Or to at least get it semi-released with some sort of performance warning if that is the primary concern?

kowd commented 1 year ago

@roblourens - any chance to revisit this in the future? It will also be very useful for the FS provider I'm building.

JasonHassold commented 1 year ago

@roblourens @andreamah following up on my previous comment, I wanted to share a demo of the extension I'm building to help convey the utility of the proposed API. Here is a Loom video showing what I've built https://www.loom.com/share/5165d909e7184601b09136cd4787198a

It's really a bummer that I have something useful that works, and I can't release it on the extensions marketplace.

andreamah commented 1 year ago

We'll try to get to this in the future! This is definitely on my radar already.

gjsjohnmurray commented 1 year ago

@andreamah please give this the attention it has long deserved. Is it unrealistic to hope this API will get finalized before the fifth anniversary of this issue?

gbrueckl commented 12 months ago

not sure if this fits here but would the proposed TextSearchProvider also allow us to search files from a Custom File System Provider which are not opened in the editor?

gjsjohnmurray commented 12 months ago

@gbrueckl yes it does.

isc-bsaviano commented 8 months ago

Hi @andreamah, just wanted to bump this again now that its the new year. I'm hoping this will finally make it on an iteration plan!

andreamah commented 3 months ago

Just commenting on some progress here- I've been looking at the API for things I'd like to address (and asking the team for feedback) before I finalize this.

The progress callback for provideTextSearchResults currently returns TextSearchMatch | TextSearchContext, which is confusing considering that these two types are quite different (the context being supplementary to the match, but not exactly a 1:1 relationship). I think that this was originally built like this in order to reduce the amount of information that gets sent from the extension host to the renderer process, but if we can de-dup identical lines in the extension host on our side, this might look a lot cleaner on the API side.
Also, the TextSearchMatchPreview.matches field of ranges makes sense, but it's a little convoluted since it has a 1:1 relation to TextSearchMatch.ranges. I think that this would be a lot clearer if one match was considered to be one range (instead of potentially a bunch of ranges).
A lot of TextSearchOptions seem to be things that would be the same across all searches from the same TextSearchProvider. An exception to this seems to be the TextSearchPreviewOptions.charsPerLine field, which seems to change based on whether the query is a regex or not.
Potentially, should the beforeContext and the afterContext be defined using the same field? In the search editor currently, we only allow one uniform value for before/after context.

All in all, the main changes seem to be

make all matches correspond to one range to make things clearer, even if it means to repeat info. We can probably de-dup as much as we can on the ext host side, and it's probably more important to make a clear API. This could potentially look quite strange if someone uses the API wrong, but that's just a sacrifice we might need to make for clarity.
Some of the TextSearchOptions should just be defined once in registerTextSearchProvider. For example, before/after context values seem strange to differ across different searches.
Also, a lingering concern from before: should we limit one search provider per scheme, or just auto-use the provider that is registered last? I'm not sure how proactive we should be about governing how people consume the extensions made using the API- what if they want to switch between using two different ones? This also applies to FileSearchProvider.

Since these might be big changes to the API before finalization, it would be great to hear some feedback. Please also ask for clarification if needed- I currently have the flu, so I'm not sure how coherent this writeup is. 😅

As for the finalization timeline of these, I've already reviewed most of the concerns that preceded my ownership of this area (https://github.com/microsoft/vscode/issues/210692). Unfortunately, I'm working on a lot of things that aren't on the public iteration plan, and it's sometimes difficult to context-switch back to this and give it the attention that it deserves. For full transparency, the minimum amount of time I'd give this before I finalize the API would be around ~2 months (June iteration for making structural changes and running these through the team, and July iteration for making final fixes and ensuring stability), however, I REALLY cannot guarantee this timeline either.

JasonHassold commented 3 months ago

@andreamah super exciting to hear this is being worked on and might be released in the near future. I'll try to review your post more thoroughly and give feedback as soon as possible.

andreamah commented 3 months ago

@JasonHassold sounds good, thanks!! No rush, though!

isc-bsaviano commented 3 months ago

@andreamah Thanks for the update! I just reviewed my implementation of this API. Here are some quick thoughts.

The progress callback for provideTextSearchResults currently returns TextSearchMatch | TextSearchContext, which is confusing considering that these two types are quite different (the context being supplementary to the match, but not exactly a 1:1 relationship). I think that this was originally built like this in order to reduce the amount of information that gets sent from the extension host to the renderer process, but if we can de-dup identical lines in the extension host on our side, this might look a lot cleaner on the API side.

I agree that this was probably done so implementers didn't have to re-send the context line for each match when there are multiple matches per line. My implementation caches the line numbers of matches so it doesn't send a TextSearchContext, for a line that has already had a match sent for it. If the extension host could de-dup context lines and match lines, that wold make implementers' jobs much easier.

Potentially, should the beforeContext and the afterContext be defined using the same field? In the search editor currently, we only allow one uniform value for before/after context.

Seems reasonable.

Some of the TextSearchOptions should just be defined once in registerTextSearchProvider. For example, before/after context values seem strange to differ across different searches.

I agree with this broadly, but I think that before/afterContext aren't good candidate settings. A user can configure the number of context liens in the search editor and it may improve performance for some implementations if this was set to 0 for searches run from the TreeView.

Also, a lingering concern from before: should we limit one search provider per scheme, or just auto-use the provider that is registered last? I'm not sure how proactive we should be about governing how people consume the extensions made using the API- what if they want to switch between using two different ones? This also applies to FileSearchProvider.

This is a good question. In my case, I only register a Text/FileSearchProvider for a FileSystemProvider that I registered in the same extension. This seems to be the intended/most common use case. If there are multiple providers and one must be auto-selected, I think it would make sense to prefer the one registered in the same extension as the corresponding FileSystemProvider.

cc @gjsjohnmurray for further comment

andreamah commented 3 months ago

@isc-bsaviano all good feedback.

I agree that this was probably done so implementers didn't have to re-send the context line for each match when there are multiple matches per line. My implementation caches the line numbers of matches so it doesn't send a TextSearchContext, for a line that has already had a match sent for it. If the extension host could de-dup context lines and match lines, that wold make implementers' jobs much easier.

This reminded me- I don't think I clarified my takeaway from this point super well. I was hoping that that TextSearchResult would be more like

interface TextSearchResult {
  match: TextSearchMatch;
  beforeContext?: TextSearchContext[];
  afterContext?: TextSearchContext[];
}

In this way, we're more strongly pairing context with at least one match, and we can de-dup on our side.

I agree with this broadly, but I think that before/afterContext aren't good candidate settings. A user can configure the number of context liens in the search editor and it may improve performance for some implementations if this was set to 0 for searches run from the TreeView.

Good point!

This is a good question. In my case, I only register a Text/FileSearchProvider for a FileSystemProvider that I registered in the same extension. This seems to be the intended/most common use case. If there are multiple providers and one must be auto-selected, I think it would make sense to prefer the one registered in the same extension as the corresponding FileSystemProvider.

Hmm, I didn't think about it from the angle of needing to adopt the two providers as a pair. I guess if we're adopting them by extension activation time and most extensions contribute both, I could see that this doesn't make a difference- however, I wonder if doing an extra check for adoping the matching-other-search-provider would complicate things a bit in terms of which provider to adopt.

isc-bsaviano commented 3 months ago

Thanks for the response @andreamah!

interface TextSearchResult {
  match: TextSearchMatch;
  beforeContext?: TextSearchContext[];
  afterContext?: TextSearchContext[];
}

This looks good to me from a usability perspective. There could be a lot of duplication if there are multiple matches on one line or many matches on consecutive lines though.

Hmm, I didn't think about it from the angle of needing to adopt the two providers as a pair. I guess if we're adopting them by extension activation time and most extensions contribute both, I could see that this doesn't make a difference- however, I wonder if doing an extra check for adoping the matching-other-search-provider would complicate things a bit in terms of which provider to adopt.

I've always seen these APIs as supporting/enhancing a FileSystemProvider. Since that's the way my extension utilizes them I'd like to make sure that workflow is fully supported no matter what is decided with respect to multiple providers. If you use "nearest activation time" to the FileSystemProvider then maybe that always work for my use case.

andreamah commented 3 months ago

Perhaps we can group any ranges that share previewText like we currently do, but in a clearer way:

export interface TextSearchMatch {
    /**
     * The uri for the matching document.
     */
    uri: Uri;

    ranges: {
        /**
         * The range of the match within the document, or multiple ranges for multiple matches.
         */
        sourceRange: Range;
        /**
         * The Range within `previewText` corresponding to the text of the match.
         */
        previewRange: Range;
    }[];

    previewText: string;
}

And then still do a uniform before/after context per TextSearchMatch (as described above). Since they share a previewText, I think it'd be safe to assume that they're all related and in the same area?

isc-bsaviano commented 3 months ago

In that case, would the previewText contain the before context, matching line and after context? If so it makes sense to me. If no context lines were requested, or the implementer can't provide them for some reason (maybe performance), would previewRange be undefined/null and previewText just contain the matching line?

andreamah commented 3 months ago

Oh, I meant this in conjunction with the original TextSearchResult. So in full:

interface TextSearchResult {
  match: TextSearchMatch;
  beforeContext?: TextSearchContext[];
  afterContext?: TextSearchContext[];
}

export interface TextSearchMatch {
    /**
     * The uri for the matching document.
     */
    uri: Uri;

    ranges: {
        /**
         * The range of the match within the document, or multiple ranges for multiple matches.
         */
        sourceRange: Range;
        /**
         * The Range within `previewText` corresponding to the text of the match.
         */
        previewRange: Range;
    }[];

    previewText: string;
}

and we'd use

progress: Progress<TextSearchResult>

to report progress.

isc-bsaviano commented 3 months ago

Quick question about TextSearchContext: will that keep the same properties as in the current proposal?

    /**
     * A line of context surrounding a TextSearchMatch.
     */
    export interface TextSearchContext {
        /**
         * The uri for the matching document.
         */
        uri: Uri;

        /**
         * One line of text.
         * previewOptions.charsPerLine applies to this
         */
        text: string;

        /**
         * The line number of this line of context.
         */
        lineNumber: number;
    }

With the TextSearchContexts being explicitly ties to match now, is the TextSearchContext.uri property needed? Won't it always be the same as the TextSearchMatch.uri?

andreamah commented 3 months ago

good catch: I think that the URI field would be removed from TextSearchContext then.

andreamah commented 3 months ago

I've given this a bit more thought, and I think that giving the options upfront vs. every call might be a bit messy, since there's no 'initial call' that is already different. However, the comments on everything else still stands. I made some initial edits in this commit: https://github.com/microsoft/vscode/pull/214041/commits/7df60218b078bb9d42900341824324611143d023

So I end up with

    export interface TextSearchMatch {
        ranges: {
            /**
             * The range of the match within the document, or multiple ranges for multiple matches.
             */
            sourceRange: Range;
            /**
             * The Range within `previewText` corresponding to the text of the match.
             */
            previewRange: Range;
        }[];

        previewText: string;
    }

    /**
     * A line of context surrounding a TextSearchMatch.
     */
    export interface TextSearchContext {

        /**
         * One line of text.
         * previewOptions.charsPerLine applies to this
         */
        text: string;

        /**
         * The line number of this line of context.
         */
        lineNumber: number;
    }

    interface TextSearchResult {
        /**
         * The uri for the matching document.
         */
        uri: Uri;
        /**
         * The match corresponding to this result
         */
        match: TextSearchMatch;
        /**
         * Any applicable context lines
         */
        beforeContext?: TextSearchContext[];
        afterContext?: TextSearchContext[];
    }

isc-bsaviano commented 3 months ago

@andreamah When will the API changes take effect? June Insiders? I want to be sure my extension is up to date.

andreamah commented 3 months ago

I need to run these changes by the team first- hopefully, this is something for this month's insiders and I'll probably try to have backward-compatibility for a while so that extensions can adopt it. I'll let you know when it's in, though!

isc-bsaviano commented 2 months ago

Hi @andreamah, has there been any progress on getting those API changes reviewed?

andreamah commented 2 months ago

Finalizing these APIs is growing to be more complicated than expected. There are 4 search APIs that need finalization that should ideally be consistent with one another: findTextInFiles, findFiles2 (revamped version of workspace.findFiles), textSearchProvider, and fileSearchProvider.

fileSearchAPI2 (1) drawio

I made a quick diagram of the proposed structure: fileSearchAPI drawio (1)

You can notice that includes/excludes is handled differently at the consumer side versus the finder side.

consumer:

folder: Uri;
includes: string[];
excludes: string[];

finder:

include: GlobPattern;
exclude: GlobPattern;

Ideally, this would be uniform. The reason why the consumer is shaped like that is because our internal query structure only allows for one base folder to make glob patterns relative to (since that's how ripgrep works). However, a GlobPattern can be a RelativePattern, which can have its own baseUri, so we can end up with two different base URIs that finder APIs pose their include/exclude against. We currently deal with this with ignoring the baseUri of the exclude on the finder APIs, which is a known bug.

Ideally, if we pass an include and exclude in findTextInFiles, the exact same include and exclude should appear in a textSearchProvider if we are providing results for that call (and same should happen for the file-equivalent).

I'm currently trying to find a way to remedy this.

Also, you can start to see how finalizing this is part of a whole web of finalization, which makes it more tricky than it seems on the surface. All of these APIs should work with one another in a clean way.

isc-bsaviano commented 2 months ago

@andreamah Thanks for the drawings and detailed response. I am aware of the two "finder" APIs but don't use them personally so they're not on my radar. It makes sense for all of these APIs to be consistent and finalized together. I did notice that glob discrepancy, and I see the value in passing the caller's globs directly to the consumer without modification. For my specific provider, having the globs be strings that apply to folder as the baseUri is helpful since I need to convert the globs to regular expressions. I could manually convert RelativePatterns to strings if the baseUri is guaranteed to be within folder though.

andreamah commented 2 months ago

I think that, if we were to make things consistent, we would not have folder at all. Rather, the providers would have something like:

includes: GlobPattern[];
excludes: GlobPattern[];

Is there a reason why you convert it to regex? If there was a case where there was no baseUri and it was just a pattern that acted against filesystem root, would that be okay for your implementation too?

Potentially, to support

include: GlobPattern;
exclude: GlobPattern;

on the finder side properly, we probably need to do some special stuff if these two attributes have different baseUris. Since our internal implementation currently has a singular base folder, in this very specific case, we might need to normalize these two GlobPattern by turning them into into pattern strings that act against filesystem root, and that's how it'd look on the provider side. This would be easiest way to adopt include/exclude GlobPattern on either side.

isc-bsaviano commented 2 months ago

@andreamah

Is there a reason why you convert it to regex?

I convert the TextSearch glob patterns to regexes because I pass the query to a remote database via a REST request. That database contains the documents and it supports matching against the names uisng a regex but has no glob support. I use minimatch's makeRe() function to do the conversion.

If there was a case where there was no baseUri and it was just a pattern that acted against filesystem root, would that be okay for your implementation too?

I need the workspace folder's Uri no matter what because it contains information required to determine which remote database I need to perform the search on. That information is in the Uri's authority, but I also need the query string. If I had that Uriand an array of glob strings then I could do my regex conversion. That is close to the current behavior. I could support GlobPatterns that are normalized against the filesystem root instead of the workspace folder root if I also knew the workspace folder's Uri.

microsoft / vscode

Stabilize TextSearchProvider API #59921

59919

59458