renerocksai / sublimeless_zk

A note taking app, Markdown editor, and text browser, featuring ID based wiki style links, and #tags, intended for zettelkasten method users. Loaded with tons of features like sophisticated tag search, note transclusion, support for note templates, bibliography support, etc. to make working in your Zettelkasten a joy πŸ˜„
GNU General Public License v3.0
197 stars 24 forks source link

search: allow to search for quoted (fixed) strings, OR, NOT #45

Closed 517qf closed 6 years ago

517qf commented 6 years ago

at the moment you can't quote terms during a search because of search_terms = search_term.split().

Maybe you could extend the method find_in_files?

Maybe the search could also be extended with an option for OR and NOT?

I know only one additional personal notetaking/wiki software that is written in python: ZIM. Zim offers verbatim search, OR and NOT. Maybe parts from its file search.py offer some inspiration.

I originally wanted to query only about verbatim search. To show engagement I made a code snippet that works for me and that's quite long because of my limited knowledge.

import re
search_term ='a "b c" d'
search_terms = []

# before splitting extract and remove quoted substrings

# double quotes
RO=re.compile(r'(")(.*?)(")')
while RO.search(search_term):
    Out=RO.search(search_term).group(2)
    search_term=re.sub(r'".*?"','',search_term,1)
    search_terms.append(Out)

# single quotes
RO=re.compile(r"(')(.*?)(')")
while RO.search(search_term):
    Out=RO.search(search_term).group(2)
    search_term=re.sub(r"'.*?'",'',search_term,1)
    search_terms.append(Out)

search_terms = search_terms + search_term.split()

print(search_terms)
renerocksai commented 6 years ago

Will look into this and #39 ... Verbatim in quotes sounds nice but then how do you search for quotes? Probably by escaping... What about searching for or and and? Quoting them? Wanted to avoid quoting and escaping, but will think about it. Maybe a regex mode makes sense, too.

renerocksai commented 6 years ago

FYI:

        # double quotes
        RO = re.compile(r'"(.*?)"')
        quoted = RO.findall(search_terms)
        search_terms = RO.sub('',search_terms)

        # single quotes
        RO = re.compile(r"'(.*?)'")
        quoted.extend(RO.findall(search_terms))
        search_terms = RO.sub('', search_terms)

        search_terms = quoted + search_terms.split()

findall returns a list of tuples, where each tuple containins each captured regex-group. So for r'(")(.*?)(")' in "hello world" it would return [ ('"', 'hello world', '"') ]. Since we're not interested in capturing the quotes before and after, I removed the parentheses around them in the regex, so they are not captured. In this case, since there's only 1 group captured per match, we don't get a list of tuples, but a list of strings: [ 'hello world' ].

BTW: I assume, the reason we have 2 regexes instead of 1 not caring whether it's " or ', is so that we can search for "can't stand"

517qf commented 6 years ago

Thanks for the great new version. And also thanks for your improvement and explanation - I should have known this one ....

I didn't know how non-German speakers quote. I added the single quote option without a lot of thought - searching for words that contain apostrophes didn't occur to me.

On second thought: maybe quoting should only work with double quotes. I just found out that the US keyboard also prominently features doubles quotes. If single quotes aren't used for quoting you could search for can't won't. Since we don't have an option to search for \' removing this quote-option might be even more relevant?

renerocksai commented 6 years ago

I agree and disagree. I like that one can quote search strings with single and double ones now. One would use double quotes by default, it's just natural. I work on a US keyboard all day; both single and double quotes are easily accessible.

The single quotes are especially useful if you search for a string that contains double quotes, like 10 PRINT "HELLO WORLD". You can just put it in single quotes and you're fine. No need for escaping. That covers the vast majority of the use-cases. Someone searching for a string containing both quote types is just nuts πŸ˜„

517qf commented 6 years ago

I might miss something but if you enter can't won't the function find_in_files in line 1454 will in a first step match and extract the string t won. So canand t remain which are merged into cant. So the list search_terms should contain ['t won', 'cant'].

You might mitigate this problem if you don't match a quote inside non-whitespace characters or only match quotes that are at the beginning of a term or a preceeded by a space? This seems possible: https://stackoverflow.com/questions/36228810/python-regular-expression-to-match-a-pattern-when-preceded-by-either-start-of-li

Another idea. Today I imported about 1500 notes and when searching often got too many hits. So I tried to make some code so that I can exclude words or groups that are preceeded by !!. I chose double exclamation marks in case someone wants to search for a single exclamation mark. This 15-line extension/modification of find_in_files seems to work for me in my test file. I published the code here: https://gist.github.com/517qf/c0b321cab0557d6a275c6371d941e84f

Maybe there's something useful for you. As you already know: I'm a beginner. So expect that I made stupid errors.

renerocksai commented 6 years ago

you're right. our current implementation was a bit immature. I will also look at this from the angle of splitting by a matching pair of quotes, with a blank or start of string before the first one and a blank or end of string after the second one. maybe we should drop the single quotes and maybe maybe allow for escaping quotes to search for by doubling them like in "search ""now"" please"

renerocksai commented 6 years ago

as for ! I would apply the same logic then: "not !this but !!that". but thats probably only needed for ! before a word

517qf commented 6 years ago

dropping single quotes sound easier. I also like your idea to use double double quotes.

Last night and just now I toyed with an extension of find_in_files that besides excluding words (which I posted last night) also allows filtering by mtime and date created (like mtime:<20180203 mtime:>2018 created:=201712 !!ausnahme suchwort. I think I have a solution that doesn't crash and delivers what I want (I finally installed anaconda and run the non-compiled version on linux).

I don't know if you like the idea to have time limits inside this search or the syntax. But this is very useful for me. I can post this later. Maybe you'll find something useful.

517qf commented 6 years ago

btw: thanks again for your time.

517qf commented 6 years ago

Here is a version of sublimess_zk.py (taken from 77ab85d) that has a modified method find_in_files that hopefully

So this should work:

created:>2017 created:<20170331 mtime:=201805 !!""das nicht"" ""das schon"" das #eintag !!#wegdamit

Link: https://gist.github.com/517qf/707e7935a97d0cea9de4bf7644f87ba9

Maybe there's something useful. Every feedback is welcome.

renerocksai commented 6 years ago

You have really put a lot of effort into this! πŸ‘

A few suggestions:

  1. You can do completely without datetime: fill all incomplete timestamps with 0s and 9s:

mtime:>201802 --> 20180200000000 mtime:<201802 --> 20180200000000 mtime:=201802 --> >=20180200000000, <=20180299999999

renerocksai commented 6 years ago
  1. I like your "" idea. One would search less frequently for ""this"" than for "this". So using ""double double quotes"" for grouping search terms is probably easier than using single quotes and having to escape the ones you want to search for. In that case !! is also consistent. But what if I want to search for what?!?!!. Ah, !! at the beginning of a word or group only, means negation. That's cool. If one wanted to include the !!, one could wrap the whole term in double double quotes: I want to search for ""!!this"" -> finds I want to search for !!this. This is what I would aim for.

I am not keen on mtimes and note-id-based times in find-in-files though. yet? maybe later?

517qf commented 6 years ago

thanks for your comments! They are very instructional for me and I will carefully process them. I don't mean to impose. So for me "yet? maybe later?" is also good.

My motivation: For me a search feature that allows to filter using various criteria (contained strings, tags, date_created, mtime, and ideally also headings, weekdays(weekend?), hour, maybe size) is important because of the way I remember. Quite often I remember vaguely that there must be a note that I made (edited) and a little bit about the context: Maye it was during a holiday, around the time I read book X (without being related to book X), after I reinstalled Windows, when I really spend time on a special topic.... I can usually quickly assign a date to these contexts. date_created or mtime limit the number of relevant notes a lot so that I can employ quite general terms and still get useful results.

As far as I see I can't do this with already built-in functions?? But I can understand that this might be a quirk of mine.

Tonight I continued toying with the search function and implented an alternative search mode that allows searches with OR (and grouping terms). Here searches are slower and take between 5 and 10 seconds (instead of 2 seconds) for my 17k files. I think this good (maybe to good to be true?). I need to check it again and maybe I can make it a little bit less ugly/verbose. This code reuses all of what I posted yesterday so all of your comments from tonight are very relevant.

renerocksai commented 6 years ago

Hi, see the above commit. I got time to work on it. Not as advanced as your functionality but that is OK for now as I have so many other things to take care of. It now supports