pyscripter / SynEdit

SynEdit is a syntax highlighting edit control, not based on the Windows common controls.
26 stars 11 forks source link

Possible Issue with SynEditSearch and repeated character strings. #78

Closed MShark67 closed 6 months ago

MShark67 commented 9 months ago

I speculate on the cause of this issue below, but here's the simplest test case I could figure out. I'm using the search and replace demo app for testing this.

Type 'aaaa' into the Editor. Put the caret after the first 'a'. Search for 'aa'. Note that the results will look like 'aa[aa]' instead of the correct 'a[aa]a' (brackets show selection in this case.)

This means that a search and replace of 'aa' with 'a' and a source line of 'aaaa' will result in 'aaa' instead of 'aa'.

[my speculation which could easily be incorrect] The search system seems to work by creating a list of pointers to the search string on a line (the FindAll function.) The issue is that it creates this list always starting from the beginning of the line (as far as I can tell.) I think it needs to create the list starting from the current caret position or current running "origin" instead.

MShark67 commented 9 months ago

I'm working on a fix and should be able to get a pull request for it soon.

MShark67 commented 9 months ago

Fixing the issue involves adding a StartIndex option to the base search engine's FindAll function. Added a new search engine that fixes this issue rather that changing the existing one which works well for most cases except for searching and replacing duplicate characters. Note that the alternative search engine uses SearchBuf which has its own definition of "wholewords". Fixing the existing search engine would involve reversing the "Next" function and other minor modifications. Pull request #79.

pyscripter commented 7 months ago

SearchBuf has a drawback in that it uses IsLetterOrDigit in word search. In programming underscore (_) is commonly an identifier character and it is not helpful to consider it as a word break character. In fact the current editor settings should be used for that.

I am looking into this and will try to provide a fix using the existing search engines.

I have updated the SearchReplace demo (replaced the search term highlighting with indicators) and fixed a couple of other related issues. Could you please use that to report any other search related issues here, including the backward search issues you mentioned? Just provide examples where the existing behaviour is wrong.

pyscripter commented 7 months ago

You may not believe it but VS Code behaves like SynEdit! Try your example above!
However Visual Studio behaves differently and it finds three matches of aa in aaaa.

pyscripter commented 7 months ago

I have committed a fix to this issue. Could you please test to see whether it works or has other side-effects?

It resolves your test case above and it should work like VS Code (with the exception of your test case).

MShark67 commented 7 months ago

Nice work! This seems to fix the issue and I liked how you worked around adding the start index to findall! I did seem to have an issue where the search seemed to go into an infinite loop. It seemed to happen at the beginning or end of a line, but I'm having a hard time recreating the issue. I'll let you know if I see it again hopefully with a test case, but wanted to mention it just in case it really is an issue. BTW, I wonder if a small optimization would be to not use a copy of the line if lnStart is one.

pyscripter commented 7 months ago

I did seem to have an issue where the search seemed to go into an infinite loop.

There was such an issue but I thought I fixed it. Did you check out the latest commit?

Could you please also check the other issues you mentioned (backward search etc.).

Regarding the optimization I am not sure it is worth it.

MShark67 commented 7 months ago

That does seem to have fixed the loop, thanks. It doesn't seem to fix the issue when doing a backwards search. The simple case is to search for 'aa' with a line of 'aaaa' and start at one before the end and search backwards. It selects the first two characters instead of selecting chars 2 and 3.

MShark67 commented 7 months ago

I may be wrong, but I don't think you'll be able to fix reverse searching without doing something like I did in my pull request. Fixing it requires the ability to search from a position, backwards, and the current system can't handle that without changes to the base search engine class.

pyscripter commented 7 months ago

The simple case is to search for 'aa' with a line of 'aaaa' and start at one before the end and search backwards. It selects the first two characters instead of selecting chars 2 and 3.

VS Code has the same issue. It is a bit of a corner case and it is hard to fix without a major redesign of the search infrastructure of SynEdit.

Potential solutions:

Also PCRE does not support reverse search. So the regular expression engine is also hard to fix.

Or just accept it as a gotcha.

MShark67 commented 7 months ago

Would an acceptable midterm solution be my pull request? It has the infrastructure change necessary to completely fix the issue with a custom search engine, or you could get the exact fix you've done here for forward mode but without having to make a copy of each line searched. The included alternate search engine only has the missing _ character check and that's fairly easy to fix with a custom searchbuf routine. Agreed on RegExp, but I don't use that for this kind of search. Mostly the current bug shows up when doing a search and replace of two quotes with one quote (something I do fairly often.)

pyscripter commented 7 months ago

Let me think about it.

MShark67 commented 7 months ago

I'm happy to discuss if you have reservations. I think we both agree that the search engine needs the starting position to ever work correctly, so just adding that would be enough for me.

pyscripter commented 7 months ago

Committed a fix to the backward searching, but have done minimal testing. Could you please have a look. It uses the current Boyer-Moore algorithm for forward search, but a "naive" one for backward search (as iin SearchBuf).

MShark67 commented 7 months ago

It does work better in some cases, but a side effect of not passing the start position into the search engine is that the "whole word" option won't work properly as it will see the truncated line "edges" as separators even if they're in the middle of a word.

pyscripter commented 7 months ago

You search part of a line either when you search from the cursor (backward or forward) or when you search in selection and the line is partially selected.

How the "Whole word" option should work, when only part of the line is searched is open to debate.

See for instance VS code. in the image below the selection was from characters 2 to 7. We are searching for aa (whole word) and within the selection. It finds 2 matches.

image

What is your opinion on this? Could you check a few other editors to see how they handle this?

The code as it stands passes to the search engine only the part of the line that should be searched. Whilst this fixed your original issue, the downside relates to the detection of whole words, which I confess I hadn't fully thought through when I introduced the change.

It works fine if you keep searching backwards or forward from the beginning/end of a line/document. Also for search in Selection it works like VS Code.

There is an issue if say you search for the whole word "bcd" in "abcd" starting from the second char. Note that you would never have ended there by searching from the start or the end of the line or document, so things like the search term highlighting works correctly.

One solution is to double check for whole word matching in SearchReplace when the line is partially searched. I will try that and see how it works.

Regarding you PR, there are a few issues that concern me:

MShark67 commented 7 months ago

The correct fix will be to use my infrastructure change to add starting position to the Search Engines and then to just use your new Search Engine that takes care of the other issues. No need for the alt one if we can just fix the existing one. As mentioned, I don't think it's really possible to fully fix the RegExp one and I don't see that as an issue. Problem solved?

pyscripter commented 7 months ago

Currently engines do not see the whole line, only the part that is searched. So passing the starting position is of no help. I would have to reverse all changes and go back to the state it was when you reported the error. I will reconsider that, among other options.

I will provide a fix, but let's agree first on how it should work:

In VSCode

Should SynEdit work like in VS Code or not? It would be helpful to check some other editors.

MShark67 commented 7 months ago

Reverting the current changes is necessary. It does sound like VSCode has some of the same bugs, so we shouldn't use it as a template for how this will work. Note that my Alt search engine is the template to do it correctly without those bugs. I'd guess that converting your new search engine to work the same will be fairly simple (perhaps famous last words lol,) I'm happy to do any leg work to handle this btw, I'm always impressed by your work and this is something I can actually help with (hence the pull request to try to handle some of the basic stuff while you do the high level stuff like that impressive sounding multi-caret editing feature.)

pyscripter commented 7 months ago

@MShark67 Could you please check again with the latest commit?

MShark67 commented 7 months ago

I haven't had the chance to look into that fix, but I think there might be other issues with doing things this way. You'll probably have to detect the RegEx search engine and bypass all of this code for it. That was another benefit of just letting the Search Engines know about the starting position. They could use it or not and it keeps search engine specific code out of the SearchReplace routine.

pyscripter commented 7 months ago

You are the most loyal supporter of this project, with many valuable contributions. I did not have strong views about the specific behaviour of the search engines, and given that I value your opinion, I accepted that it should behave as you would expect and recommended.

The latest commit also passes the starting and ending positions of the search to FindAll.

Here are some of the tests I have tried:

procedure TSearchReplaceDemoForm.RunTests;
  procedure Test(SearchEngine: TSynEditSearchCustom; Line, Pattern: string;
    Options: TSynSearchOptions; StartChar, EndChar, Res: Integer);
  begin
    SearchEngine.Options := Options;
    SearchEngine.Pattern := Pattern;
    SearchEngine.FindAll(Line, StartChar, EndChar);
    Assert(((SearchEngine.ResultCount = 0) and (Res = 0)) or
      ((ssoBackwards in Options) and (SearchEngine.Results[SearchEngine.ResultCount - 1] = Res)) or
      (not (ssoBackwards in Options) and (SearchEngine.Results[0] = Res)), Line + ' -- ' + Pattern);
  end;

begin
  Test(SynEditSearch, 'aaaa', 'aa', [], 2, 5, 2);
  Test(SynEditSearch, 'aaaa', 'aa', [], 2, 0, 2);
  Test(SynEditSearch, 'aaaa', 'aa', [ssoBackwards], 1, 0, 3);
  Test(SynEditSearch, 'aaaa', 'aa', [ssoBackwards], 1, 4, 2);
  Test(SynEditSearch, 'aaaa', 'aa', [ssoWholeWord], 1, 0, 0);
  Test(SynEditSearch, 'aaaa', 'aa', [ssoBackwards, ssoWholeWord], 1, 0, 0);
  Test(SynEditSearch, 'aaa aa aaa', 'aa', [ssoWholeWord], 1, 0, 5);
  Test(SynEditSearch, 'aaa aa aaa', 'aaa', [ssoWholeWord], 1, 0, 1);
  Test(SynEditSearch, 'aaa aa aaa', 'aa', [ssoBackwards, ssoWholeWord], 1, 0, 5);
  Test(SynEditSearch, 'aaaa', 'aaa', [ssoBackwards, ssoWholeWord], 1, 4, 0);

  Test(SynEditRegexSearch, 'aaaa', 'aa', [], 1, 5, 1);
  Test(SynEditRegexSearch, 'aaaa', 'aa', [], 2, 5, 2);
  Test(SynEditRegexSearch, 'aaaa', 'aa', [], 2, 0, 2);
  Test(SynEditRegexSearch, 'aaaa', 'aa', [ssoBackwards], 1, 0, 3);
  Test(SynEditRegexSearch, 'aaaa', 'aa', [ssoBackwards], 1, 4, 1);  // !!
end;

Regarding the regexp engine:

MShark67 commented 7 months ago

Love it! Especially adding StartPos to RegEx.Matches.. that's a very nice improvement. Just a couple issues:

There is an issue with SynEditRegexSearch which relates to my last message. It uses copy to work on a subset of the line, and so thinks that the line end is the current caret position when searching backwards. This causes any search with $, \b, non-capture groups and probably a bunch of other scenarios to not work correctly.

The other issue is that removing the InvalidSearchRange check from the SearchReplace method means that each SearchEngine has to do it itself. I just mention this because if users have custom written search engines they will have to modify them (besides just adding the new findall parameters.)

Here's my attempted fix for RegEx's FindAll: (I couldn't figure out how to modify or create new MatchCollections which would allow removing fResultCount)

function TSynEditRegexSearch.FindAll(const NewText: string;
  StartChar: Integer = 1; EndChar: Integer = 0): Integer;
begin
  fMatchCollection :=  RegEx.Matches(NewText, StartChar);
  Result := fMatchCollection.Count;
  while (Result > 0) and ((fMatchCollection[Result - 1].Index + fMatchCollection[Result - 1].Length) > EndChar) do
    Dec(Result);
  // This stored count also needs to be returned by GetResultCount and reset where appropriate!
  fResultCount := Result;   
end;

Now the only bug is the RegEx reverse search and as you've pointed out, we really can't fix it. It's not a big deal for me.

pyscripter commented 6 months ago

Done. Please check.

MShark67 commented 6 months ago

No issues found! Great work!