Open roblourens opened 5 years ago
looking forward to progress
Any progress on moving this to the stable API?
Is there anything preventing this from happening? Both of the issues it's shown as depending on are already closed.
This still isn't ready to be stabilized, I don't have any ETA, sorry.
There needs to be a 😢 reaction
Any progress to report on getting this finalized?
Please put this on the 2020 roadmap, preferably sooner rather than later.
@roblourens 🙏 can this get some love soon? Those of us building extensions that implement FileSystemProvider can only offer search if we get our users to (a) use Insiders, (b) download and install a VSIX, and (c) launch Insiders with the correct --enable-proposed-api argument.
It is a small relief that (c) can now be achieved using argv.json, but what we really need is the API finalized.
I'm interested in the stabilization of this as well. What's blocking this, and how can we help?
Sorry, it still needs a lot of thinking and I won't get to it in the near future
Sorry, it still needs a lot of thinking and I won't get to it in the near future
@roblourens this is disheartening news for me, and likely for others trying to leverage FileSystemProvider and take VSCode into new domains. Is there any way we can help? What do you see as the outstanding problems with the proposed API?
Heyo! I've been following this issue for a while, and I was wondering if by now there's a better idea on when this API will be stabilized. Seeing that there are already quite some extensions using it, even including Microsoft extensions in the marketplace.
@roblourens please can we blow the dust off this and get it into Stable? Or else get an understanding of what's holding it up? IMO as more and more FileSystemProvider implementations show up (aka virtual filesystems) the more important it becomes to resolve this.
What's the status here? I am looking to utilise such API to implement a search over my own FileSystemProvider.
Thanks!
@roblourens is this still on your radar? If you don't have capacity for it can it be handed over to another member of the team?
Extension authors who need this but whose extensions aren't "blessed" as being allowed to use this proposed API have to try and compete while having one hand tied behind their back.
@roblourens are there some known problems or shortcomings affecting this proposed API, four full years after you created this issue? The only one I am aware of is it still hasn't been finalized.
This will be one for @andreamah to look at some day
waiting for TextSearchProvider to the release , not insider. I want to get the matched words by TextSearchResult, not only the whole text by TextDocument.getText()
Hi @andreamah, now that 1.74.0 is out and you're in the yearly housekeeping iteration do you have an updates on when this issue and #73524 (FileSearchProvider finalization) may be addressed? They have been stagnant for years and we haven't gotten any feedback on why finalization hasn't happened. For my team in particular, finalizing these two APIs would make a huge difference for our end users. If there's anything I can do to help get these two issues finalized please let me know.
I can only agree with @isc-bsaviano. This feature would be very very helpful for us as well.
Sorry, we're aware that this is an important API for a lot of scenarios. We don't have a specific plan for it at the moment but it is still on our radar. The only reason it didn't happen originally, and hasn't since then, is that it turned out to be a complex API that was hard to design in a way that would be performant and also consistent with the rest of our extension API, and other feature areas have taken priority since then.
@roblourens Thanks for the update. Does that comment apply to FileSearchProvider
as well? While this is much more important for us, we consume that one as well and therefore have a desire to see it finalized.
Yes, same status
@roblourens @andreamah Hi, I am currently working on an extension using this proposed API and am a bit concerned that it will not be finalized in the near future. It seems fully functional for my use case in its current state, and I really don't want to have to go the route of implementing a secondary search system outside of VSCode to accomplish the goal of my extension. I think some of the confusion myself and others have around this is that it seems to work fine, and we are missing out on the opportunity to utilize this feature in a meaningful way at the moment.
I think this is a really cool API, and a bunch of useful extensions could be built on top of it. Is there any way to get the priority of this moved up? Or to at least get it semi-released with some sort of performance warning if that is the primary concern?
@roblourens - any chance to revisit this in the future? It will also be very useful for the FS provider I'm building.
@roblourens @andreamah following up on my previous comment, I wanted to share a demo of the extension I'm building to help convey the utility of the proposed API. Here is a Loom video showing what I've built https://www.loom.com/share/5165d909e7184601b09136cd4787198a
It's really a bummer that I have something useful that works, and I can't release it on the extensions marketplace.
We'll try to get to this in the future! This is definitely on my radar already.
@andreamah please give this the attention it has long deserved. Is it unrealistic to hope this API will get finalized before the fifth anniversary of this issue?
not sure if this fits here but would the proposed TextSearchProvider also allow us to search files from a Custom File System Provider which are not opened in the editor?
@gbrueckl yes it does.
Hi @andreamah, just wanted to bump this again now that its the new year. I'm hoping this will finally make it on an iteration plan!
Just commenting on some progress here- I've been looking at the API for things I'd like to address (and asking the team for feedback) before I finalize this.
provideTextSearchResults
currently returns TextSearchMatch | TextSearchContext
, which is confusing considering that these two types are quite different (the context
being supplementary to the match
, but not exactly a 1:1 relationship). I think that this was originally built like this in order to reduce the amount of information that gets sent from the extension host to the renderer process, but if we can de-dup identical lines in the extension host on our side, this might look a lot cleaner on the API side.TextSearchMatchPreview.matches
field of ranges makes sense, but it's a little convoluted since it has a 1:1 relation to TextSearchMatch.ranges
. I think that this would be a lot clearer if one match was considered to be one range (instead of potentially a bunch of ranges).TextSearchOptions
seem to be things that would be the same across all searches from the same TextSearchProvider
. An exception to this seems to be the TextSearchPreviewOptions.charsPerLine
field, which seems to change based on whether the query is a regex or not. beforeContext
and the afterContext
be defined using the same field? In the search editor currently, we only allow one uniform value for before/after context.All in all, the main changes seem to be
TextSearchOptions
should just be defined once in registerTextSearchProvider
. For example, before/after context values seem strange to differ across different searches.FileSearchProvider
.Since these might be big changes to the API before finalization, it would be great to hear some feedback. Please also ask for clarification if needed- I currently have the flu, so I'm not sure how coherent this writeup is. 😅
As for the finalization timeline of these, I've already reviewed most of the concerns that preceded my ownership of this area (https://github.com/microsoft/vscode/issues/210692). Unfortunately, I'm working on a lot of things that aren't on the public iteration plan, and it's sometimes difficult to context-switch back to this and give it the attention that it deserves. For full transparency, the minimum amount of time I'd give this before I finalize the API would be around ~2 months (June iteration for making structural changes and running these through the team, and July iteration for making final fixes and ensuring stability), however, I REALLY cannot guarantee this timeline either.
@andreamah super exciting to hear this is being worked on and might be released in the near future. I'll try to review your post more thoroughly and give feedback as soon as possible.
@JasonHassold sounds good, thanks!! No rush, though!
@andreamah Thanks for the update! I just reviewed my implementation of this API. Here are some quick thoughts.
The progress callback for provideTextSearchResults currently returns TextSearchMatch | TextSearchContext, which is confusing considering that these two types are quite different (the context being supplementary to the match, but not exactly a 1:1 relationship). I think that this was originally built like this in order to reduce the amount of information that gets sent from the extension host to the renderer process, but if we can de-dup identical lines in the extension host on our side, this might look a lot cleaner on the API side.
I agree that this was probably done so implementers didn't have to re-send the context line for each match when there are multiple matches per line. My implementation caches the line numbers of matches so it doesn't send a TextSearchContext,
for a line that has already had a match sent for it. If the extension host could de-dup context lines and match lines, that wold make implementers' jobs much easier.
Potentially, should the beforeContext and the afterContext be defined using the same field? In the search editor currently, we only allow one uniform value for before/after context.
Seems reasonable.
Some of the TextSearchOptions should just be defined once in registerTextSearchProvider. For example, before/after context values seem strange to differ across different searches.
I agree with this broadly, but I think that before/afterContext aren't good candidate settings. A user can configure the number of context liens in the search editor and it may improve performance for some implementations if this was set to 0
for searches run from the TreeView.
Also, a lingering concern from before: should we limit one search provider per scheme, or just auto-use the provider that is registered last? I'm not sure how proactive we should be about governing how people consume the extensions made using the API- what if they want to switch between using two different ones? This also applies to FileSearchProvider.
This is a good question. In my case, I only register a Text/FileSearchProvider
for a FileSystemProvider
that I registered in the same extension. This seems to be the intended/most common use case. If there are multiple providers and one must be auto-selected, I think it would make sense to prefer the one registered in the same extension as the corresponding FileSystemProvider
.
cc @gjsjohnmurray for further comment
@isc-bsaviano all good feedback.
I agree that this was probably done so implementers didn't have to re-send the context line for each match when there are multiple matches per line. My implementation caches the line numbers of matches so it doesn't send a TextSearchContext, for a line that has already had a match sent for it. If the extension host could de-dup context lines and match lines, that wold make implementers' jobs much easier.
This reminded me- I don't think I clarified my takeaway from this point super well. I was hoping that that TextSearchResult
would be more like
interface TextSearchResult {
match: TextSearchMatch;
beforeContext?: TextSearchContext[];
afterContext?: TextSearchContext[];
}
In this way, we're more strongly pairing context with at least one match, and we can de-dup on our side.
I agree with this broadly, but I think that before/afterContext aren't good candidate settings. A user can configure the number of context liens in the search editor and it may improve performance for some implementations if this was set to 0 for searches run from the TreeView.
Good point!
This is a good question. In my case, I only register a Text/FileSearchProvider for a FileSystemProvider that I registered in the same extension. This seems to be the intended/most common use case. If there are multiple providers and one must be auto-selected, I think it would make sense to prefer the one registered in the same extension as the corresponding FileSystemProvider.
Hmm, I didn't think about it from the angle of needing to adopt the two providers as a pair. I guess if we're adopting them by extension activation time and most extensions contribute both, I could see that this doesn't make a difference- however, I wonder if doing an extra check for adoping the matching-other-search-provider would complicate things a bit in terms of which provider to adopt.
Thanks for the response @andreamah!
interface TextSearchResult {
match: TextSearchMatch;
beforeContext?: TextSearchContext[];
afterContext?: TextSearchContext[];
}
This looks good to me from a usability perspective. There could be a lot of duplication if there are multiple matches on one line or many matches on consecutive lines though.
Hmm, I didn't think about it from the angle of needing to adopt the two providers as a pair. I guess if we're adopting them by extension activation time and most extensions contribute both, I could see that this doesn't make a difference- however, I wonder if doing an extra check for adoping the matching-other-search-provider would complicate things a bit in terms of which provider to adopt.
I've always seen these APIs as supporting/enhancing a FileSystemProvider
. Since that's the way my extension utilizes them I'd like to make sure that workflow is fully supported no matter what is decided with respect to multiple providers. If you use "nearest activation time" to the FileSystemProvider
then maybe that always work for my use case.
Perhaps we can group any ranges that share previewText like we currently do, but in a clearer way:
export interface TextSearchMatch {
/**
* The uri for the matching document.
*/
uri: Uri;
ranges: {
/**
* The range of the match within the document, or multiple ranges for multiple matches.
*/
sourceRange: Range;
/**
* The Range within `previewText` corresponding to the text of the match.
*/
previewRange: Range;
}[];
previewText: string;
}
And then still do a uniform before/after context per TextSearchMatch
(as described above). Since they share a previewText, I think it'd be safe to assume that they're all related and in the same area?
In that case, would the previewText
contain the before context, matching line and after context? If so it makes sense to me. If no context lines were requested, or the implementer can't provide them for some reason (maybe performance), would previewRange
be undefined/null and previewText
just contain the matching line?
Oh, I meant this in conjunction with the original TextSearchResult
. So in full:
interface TextSearchResult {
match: TextSearchMatch;
beforeContext?: TextSearchContext[];
afterContext?: TextSearchContext[];
}
export interface TextSearchMatch {
/**
* The uri for the matching document.
*/
uri: Uri;
ranges: {
/**
* The range of the match within the document, or multiple ranges for multiple matches.
*/
sourceRange: Range;
/**
* The Range within `previewText` corresponding to the text of the match.
*/
previewRange: Range;
}[];
previewText: string;
}
and we'd use
progress: Progress<TextSearchResult>
to report progress.
Quick question about TextSearchContext
: will that keep the same properties as in the current proposal?
/**
* A line of context surrounding a TextSearchMatch.
*/
export interface TextSearchContext {
/**
* The uri for the matching document.
*/
uri: Uri;
/**
* One line of text.
* previewOptions.charsPerLine applies to this
*/
text: string;
/**
* The line number of this line of context.
*/
lineNumber: number;
}
With the TextSearchContexts
being explicitly ties to match now, is the TextSearchContext.uri
property needed? Won't it always be the same as the TextSearchMatch.uri
?
good catch: I think that the URI field would be removed from TextSearchContext
then.
I've given this a bit more thought, and I think that giving the options upfront vs. every call might be a bit messy, since there's no 'initial call' that is already different. However, the comments on everything else still stands. I made some initial edits in this commit: https://github.com/microsoft/vscode/pull/214041/commits/7df60218b078bb9d42900341824324611143d023
So I end up with
export interface TextSearchMatch {
ranges: {
/**
* The range of the match within the document, or multiple ranges for multiple matches.
*/
sourceRange: Range;
/**
* The Range within `previewText` corresponding to the text of the match.
*/
previewRange: Range;
}[];
previewText: string;
}
/**
* A line of context surrounding a TextSearchMatch.
*/
export interface TextSearchContext {
/**
* One line of text.
* previewOptions.charsPerLine applies to this
*/
text: string;
/**
* The line number of this line of context.
*/
lineNumber: number;
}
interface TextSearchResult {
/**
* The uri for the matching document.
*/
uri: Uri;
/**
* The match corresponding to this result
*/
match: TextSearchMatch;
/**
* Any applicable context lines
*/
beforeContext?: TextSearchContext[];
afterContext?: TextSearchContext[];
}
@andreamah When will the API changes take effect? June Insiders? I want to be sure my extension is up to date.
I need to run these changes by the team first- hopefully, this is something for this month's insiders and I'll probably try to have backward-compatibility for a while so that extensions can adopt it. I'll let you know when it's in, though!
Hi @andreamah, has there been any progress on getting those API changes reviewed?
Finalizing these APIs is growing to be more complicated than expected. There are 4 search APIs that need finalization that should ideally be consistent with one another: findTextInFiles
, findFiles2
(revamped version of workspace.findFiles
), textSearchProvider
, and fileSearchProvider
.
I made a quick diagram of the proposed structure:
You can notice that includes/excludes is handled differently at the consumer side versus the finder side.
consumer:
folder: Uri;
includes: string[];
excludes: string[];
finder:
include: GlobPattern;
exclude: GlobPattern;
Ideally, this would be uniform. The reason why the consumer is shaped like that is because our internal query structure only allows for one base folder to make glob patterns relative to (since that's how ripgrep works). However, a GlobPattern
can be a RelativePattern
, which can have its own baseUri
, so we can end up with two different base URIs that finder APIs pose their include/exclude against. We currently deal with this with ignoring the baseUri
of the exclude
on the finder APIs, which is a known bug.
Ideally, if we pass an include
and exclude
in findTextInFiles
, the exact same include
and exclude
should appear in a textSearchProvider
if we are providing results for that call (and same should happen for the file-equivalent).
I'm currently trying to find a way to remedy this.
Also, you can start to see how finalizing this is part of a whole web of finalization, which makes it more tricky than it seems on the surface. All of these APIs should work with one another in a clean way.
@andreamah Thanks for the drawings and detailed response. I am aware of the two "finder" APIs but don't use them personally so they're not on my radar. It makes sense for all of these APIs to be consistent and finalized together. I did notice that glob discrepancy, and I see the value in passing the caller's globs directly to the consumer without modification. For my specific provider, having the globs be strings that apply to folder
as the baseUri
is helpful since I need to convert the globs to regular expressions. I could manually convert RelativePattern
s to strings if the baseUri
is guaranteed to be within folder
though.
I think that, if we were to make things consistent, we would not have folder
at all. Rather, the providers would have something like:
includes: GlobPattern[];
excludes: GlobPattern[];
Is there a reason why you convert it to regex? If there was a case where there was no baseUri and it was just a pattern that acted against filesystem root, would that be okay for your implementation too?
Potentially, to support
include: GlobPattern;
exclude: GlobPattern;
on the finder side properly, we probably need to do some special stuff if these two attributes have different baseUri
s. Since our internal implementation currently has a singular base folder, in this very specific case, we might need to normalize these two GlobPattern
by turning them into into pattern strings that act against filesystem root, and that's how it'd look on the provider side. This would be easiest way to adopt include/exclude GlobPattern
on either side.
@andreamah
Is there a reason why you convert it to regex?
I convert the TextSearch glob patterns to regexes because I pass the query to a remote database via a REST request. That database contains the documents and it supports matching against the names uisng a regex but has no glob support. I use minimatch's makeRe()
function to do the conversion.
If there was a case where there was no baseUri and it was just a pattern that acted against filesystem root, would that be okay for your implementation too?
I need the workspace folder's Uri
no matter what because it contains information required to determine which remote database I need to perform the search on. That information is in the Uri
's authority
, but I also need the query
string. If I had that Uri
and an array of glob strings then I could do my regex conversion. That is close to the current behavior. I could support GlobPattern
s that are normalized against the filesystem root instead of the workspace folder root if I also knew the workspace folder's Uri
.
Master issue to track stabilizing the TextSearchProvider extension API...
Forked from #47058
Depends on
59919
59458