zot / microfts

Small and fast FTS (full text search)
MIT License
32 stars 3 forks source link

microfts doesn't handle org source blocks #14

Open fpatz opened 3 years ago

fpatz commented 3 years ago

The org parser/chunker in microfts seems to get confused by source blocks. When searching for a term that appears in a file with source blocks, the search UI displays just one huge line with the first source block for the org file (and none of the other matches). The screen shot shows a situation, where many lines in the file match "sphinx". In this case, the source block doesn't even have a match for the search term (it looks like the org parser squeezed all text into the source block chunk).

Bildschirmfoto 2021-01-16 um 14 44 14

A simple workaround obviously is to remove -org from org-fts-input-args, which I did (and it's still very useful then, but not quite what you intended, I guess).

Here is a shell transcript of reproducing this from the command line:

$ THIS IS WRONG
$ rm org-fts.db
$ ./microfts create org-fts.db
$ ./microfts input -org org-fts.db ~/org/links.org
$ ./microfts search org-fts.db sphinx | cut -c -80
~/org/links.org:29:    #+begin_src python\n      import sys, 

$ WITHOUT ORG PARSING (same input file)
$ rm org-fts.db
$ ./microfts create org-fts.db
$ ./microfts input org-fts.db ~/org/links.org
$ ./microfts search org-fts.db sphinx | cut -c -80
~/org/links.org:682:** Sphinx
~/org/links.org:684:*** Requirements, Bugs, Test cases, … ins
~/org/links.org:688:*** Why use reStructuredText and Sphinx s
~/org/links.org:695:    documents, then Sphinx (or any of Mar
~/org/links.org:698:    sphinx-static-site-generator-for-main
...

I looked into the code, but my Go fu is moot, so no patch, sorry ...

fpatz commented 3 years ago

BTW, this is reproducible with the README.org from the microfts repository: the block end marker #+end_src is somehow missed, and all text following a block is swallowed into one chunk.

zot commented 3 years ago

Sheesh, sorry I haven't seen this -- massive real-life interrupts. I'll look into it.

zot commented 3 years ago

I'm seriously thinking of removing the -org option. I think in most cases, people want line-by-line results anyway because the paragraphs and blocks have have multiple hits in them...

fpatz commented 3 years ago

Thanks for caring! Indeed, indexing paragraph- (or even document-?) wise and then presenting search results appropriately is not easy. I've been using microfts with Emacs and a few hundred org files for some time now, and line-wise is pretty cool.

zot commented 3 years ago

Glad to hear you've been using it! That may give me incentive to actually publish it to Melpa...

Btw, I just pushed an update to GitHub that formats the ivy results like in orgmode, so you can see links more clearly, etc.