renerocksai / sublimeless_zk

A note taking app, Markdown editor, and text browser, featuring ID based wiki style links, and #tags, intended for zettelkasten method users. Loaded with tons of features like sophisticated tag search, note transclusion, support for note templates, bibliography support, etc. to make working in your Zettelkasten a joy 😄
GNU General Public License v3.0
197 stars 24 forks source link

"find in files" not working as expected [0.6, win10] #39

Closed 517qf closed 6 years ago

517qf commented 6 years ago

I think the search option "find in files" in a zettelkasten for something like "one two three" should deliver all files that contain these words in any order in any position (and not just like grep, rg which only show you words that are on the same line and in the given order).

I think your find in files option aims for the same.

the window "Notes matching search ..." doesn't return the whole string. And it doesn't work as expected. See my attached gif.

Different search terms seem to be connected by OR and not AND. at least this is my understanding of https://github.com/renerocksai/sublimeless_zk/blob/master/src/sublimeless_zk.py#L1305 .

To connect terms with an AND condition wouldn' you need something like this? (beware: my python knowledge is limited to the some 200 pages of an intro book quite some time ago ...)

for search_term in search_terms:
    if search_term in text:
        this_file_is_a_candidate=True
    else:
        this_file_is_a_candidate=False
if this_file_is_a_candidate:
    result_notes.append(note)

slzk_search

renerocksai commented 6 years ago

Good catch re not the whole string. Yes, find in files searches for all notes that match one or more of the search words. It's not really thought out well... yet.... Yet I find it useful this way. Must check how others do it 😄

renerocksai commented 6 years ago

Hmmm. "Not as expected", depends on what people expect. I just checked with SublimeText and VS code and both always search for the entire string ("hello world" only finds files that contain hello + space + world). So wouldn't one expect that behaviour? I already OR together the search terms to allow for quick OR search... Will continue this on #45 and close here, seems too similar.

517qf commented 6 years ago

Was ich hier beschreibe betrifft einen anderen Aspekt der Suche als der allgemeinere Post in #45. Der spätere Titel mit AND/OR war unglücklich gewählt. Ich hätte es klarer abgrenzen sollen.

Ich wusste nicht, dass Du absichtlich eine OR Verknüpfung nimmst. Das finde ich aber keine gute Idee: sublimeless_zk verhält sich hier auch anders als VSCode und andere Editoren:

(weiter auf englisch, falls in Zukunft wer mal nachfragt):

VSCode's seach in the left sidebar (Ctrl+Shift+F) doesn't use OR: I put the word Wurst in the file test1 and the word Brot in a file test2 in my project. If I search for Wurst Brot I don' get a hit. So VSCode uses AND.

(Das oben angehängte gif soll zeigen, dass wenn ich nach einzigartiger kein suche fälschlich eine Datei als Treffer angezeigt wird, die folgenden Inhalt hat: Das ist erster einzigartiger Satz)

If I create two such files in sl_zk and search for Wurst Brot I get two matches for both files.

Apart from this: I think sublimetext and VSCode are not a good comparison because they are mainly text/code editors. VSCode doesn't allow for multi-line matching because of limitations of ripgrep (afaik). A better reference might be dedicated notetaking software like ZIMwiki, NValt, emacs deft (which is a clone of nvalt), or evernote or even gmail. Whenever I enter two terms like this word all files (emails) are show that contain both words (anywhere in any order).

I think the code posted above would fully fix the point I raised in this issue.

517qf commented 6 years ago

PS: Ich suche das Wort "Brot" in einer Datei. Die Funktion find_in_files verwandelt die durchsuchten Dateien in Zeile 1306 in Kleinbuchstaben. Wenn ich nach dem Wort "Brot" (erster Buchstabe groß) suche, kann ich keinen Treffer erhalten, denn die search terms werden nicht als lower gesetzt. Deshalb würde ich in Zeile 1307 ändern: statt wie bisher for search_term in search_terms: das folgende for search_term.lower() in search_terms:

renerocksai commented 6 years ago

OK I wasn't clear enough. I stated two things that were meant to be separate:

a) SublimeText and VS Code don't even use AND. They just search for the complete string. AND would match "hello big world" if searched for "hello world". Neither SublimeText nor VS Code use AND (hence, they find 0 files in the hello big world example). So I questioned, when talking about expectations, why use AND when people probably expect what they know from other editors, which is: entire string matching.

b) I stated that, expectations aside, to make it more user-friendly (than whole string matching), I use an OR search, which was requested by @argonsnorts, if I recall correctly. Any of the search terms leads to a match. Apart from not lower()ing the search terms themselves (good catch! 👍 ) , that seemed like a reasonable approach to me.

I was aware that both GitHub issues discuss different things; I just thought they would be related enough to join the discussions into one thread.

Now, having said all that, I think the AND approach is reasonable. It is even closer to how the fuzzy search works, so it might be a tad more intuitive.

Your code suggestion is fine, theoretically. But it can be improved:

Improvement 1: Avoid continuing the loop when the end result is clear already:

for search_term in search_terms:
    if search_term in text:
        this_file_is_a_candidate=True
        break
    else:
        this_file_is_a_candidate=False
        break
if this_file_is_a_candidate:
    result_notes.append(note)

Improvement 2: We see that we only append if all search terms were in text. We can put it in other words (negate it): we don't append if a single search term doesn't match, which leads to even shorter code:

for search_term in search_terms:
    if search_term not in text:
        break
else:
    result_notes.append(note)

For with an else means: If there was no break executed, do the else.

Improvement 3: You're right, we must lower() the search terms. But we can also do this by lower()ing before we split -> only 1 call to lower() instead of one per search term. But I know, this doesn't bring any speedups effectively, unless you enter millions of search terms 😄

517qf commented 6 years ago

Thanks for your patient explanations on how my amateur code could be improved. This was really useful for me (especially the else part of the for-loop).

I use an OR search, which was requested by @argonsnorts, if I recall correctly.

I have another idea why you might have chose OR prior to yesterdays commit: In the advanced tag search two terms separated by a space in the tag search mean OR - whereas AND requires a comma.

So my nagging might have lead to a somewhat inconsistent behavior compared with the tag search? Even if this were so - I think that the new behavior is an (in my view huge) improvement (not least because there wasn't an option for AND).

btw: I wonder where does the syntax (comma for AND - space for OR) in the advanced tag search come from? Is this commonly used in other software? I wonder because it took me a long time to figure out the google advanced search and an answer might eliminate an "unknown unknown" for me ...

Thanks again.